RecRecNet: Fisheye Rectangling Network
- RecRecNet is a neural architecture that rectifies fisheye distortions while ensuring a clean, rectangular image boundary.
- It employs a thin-plate spline module combined with degree-of-freedom curriculum learning to enable unsupervised, end-to-end rectangling.
- Experiments demonstrate improved PSNR, SSIM, and detection accuracy, benefiting downstream vision models.
RecRecNet is a neural architecture designed to rectify the unique geometric issues that arise from wide-angle (fisheye) image rectification, specifically targeting the challenge of preserving both undistorted scene content and a clean, rectangular image boundary. The network integrates a thin-plate spline-based mesh deformation module with degree-of-freedom (DoF)-based curriculum learning to achieve unsupervised, end-to-end rectangling of already-rectified, warped-boundary imagery. Its development addresses the acute problem posed by conventional rectification methods, which, while effective at eliminating content distortion, yield images with highly irregular, non-rectangular boundaries detrimental to standard vision pipelines (Liao et al., 2023).
1. Problem Motivation and Background
Wide-angle optics, particularly fisheye lenses, introduce severe radial distortions manifest as nonlinear warping of lines and geometry. Rectification pipelines—parametric or learning-based—remap individual pixels towards the optical axis for restoring straight lines in the imaged content. However, these corrections compress the image non-uniformly, resulting in an output where the interior is visually plausible but the border becomes warped, often resembling a convex bulge towards the image center. For high-level tasks relying on standard convolutional neural networks, which assume rectangular input tensors, these geometrically irregular outputs introduce artificial boundaries from zero-padding, leading to feature discontinuities, hallucinated edges, and degraded detection or segmentation accuracy, particularly near the periphery.
RecRecNet is formulated to generate rectified outputs that are both geometrically faithful in the content and strictly bounded by a rectangle—retaining compatibility with downstream models (Liao et al., 2023).
2. Network Architecture
RecRecNet operates on an input image with warped boundaries and predicts a non-rigid deformation field to both recover a rectangular boundary and preserve scene integrity. The main architectural elements are:
- Feature Extraction: ResNet-50 acts as the backbone, generating a deep feature map of shape .
- Motion Header: Four convolutional layers (each with 512 channels) are followed by a sequence of three fully-connected layers (outputting ), which regress the source coordinates of control points .
- TPS Rectangling Module: The predicted control points, together with a fixed, rectangular target grid , are input to a thin-plate spline interpolator to estimate a smooth, differentiable warp . This is implemented as a grid-sampler for end-to-end backpropagation.
- Curriculum Learning: The model is trained in three progression stages, with increasing transformation complexity.
The network is explicitly unsupervised with respect to geometric deformation, learning to align content and boundary via proxy losses and curriculum (Liao et al., 2023).
3. Thin-Plate Spline Rectangling
The core geometric operator in RecRecNet is a 2D thin-plate spline (TPS) deformation, which provides a globally smooth, locally flexible warp between the control points predicted on the source image and those of a regular rectangular grid. The TPS transformation is parameterized as:
where 0 is the TPS radial basis, 1 and 2 define affine behavior, and 3 are coefficients chosen so that 4 while minimizing bending energy
5
The TPS mapping is analytically solvable as a small linear system, yielding a dense, differentiable spatial transformation suitable for mesh deformation in an end-to-end network (Liao et al., 2023).
4. Degree-of-Freedom Curriculum Learning
To address the instability and complexity of full-rank TPS training from scratch, RecRecNet employs a three-stage curriculum:
- Stage 1 (4-DoF): Pre-train on similarity transformations (encompassing translation, rotation, and uniform scaling).
- Stage 2 (8-DoF): Pre-train using random homographies, expanding the allowable transformations.
- Stage 3 (Full TPS): Train on the complete, non-rigid TPS deformation required for rectangling.
Each curriculum stage uses the same motion head but increases the degrees of freedom (DoF) in the geometric targets. This method facilitates better localization of boundary control points and accelerates convergence, particularly in the more complex non-rigid regime (curved, inward-bulged boundaries to straightened rectangles) (Liao et al., 2023).
5. Optimization and Losses
The composite loss function for RecRecNet comprises three terms, jointly optimized in each curriculum stage:
- Appearance Loss: 6 (pixel 7 loss)
- Perceptual Loss: 8, where 9 is a VGG-19 feature map
- Mesh Smoothness Loss: 0 over pairs of adjacent mesh edges, encouraging colinearity
The overall objective is:
1
with recommended weights 2, 3, 4. Training is performed using Adam (lr=1e−4, decayed), for 260 epochs in total, partitioned as 30/50/180 across the three curriculum stages (Liao et al., 2023).
6. Experimental Protocol and Results
- Data: 5,160 synthetic training pairs and 500 test pairs, based on MS-COCO images distorted with a 4th-order polynomial fisheye model. Rectangling “pseudo-ground-truth” is generated with a filtered mesh optimizer [He et al. 2013].
- Metrics: PSNR and SSIM for image quality; AP and mIoU for Mask-R-CNN-based detection and segmentation.
- Quantitative Performance:
- RecRecNet achieves ≈18.7 dB PSNR (+3 dB over baselines), SSIM ≈0.55 (+0.13), AP ≈41.3% (+6 points), and mIoU ≈37.8% (+7 points) relative to cropping/padding/He2013/ROP2021 methods.
- Qualitative Performance:
- Boundary rectangling is visually perfect without hallucinated regions (unlike outpainting), enabling Mask-R-CNN to recover objects previously missed near edges.
- Cross-Domain Generalization:
- The TPS model allows RecRecNet to generalize to real fisheye captures and alternate upstream rectifications without retraining (Liao et al., 2023).
7. Insights, Limitations, and Future Directions
Preserving a strictly rectangular boundary is essential for downstream compatibility with off-the-shelf vision models; non-rectangular outputs introduce artificial artifacts and confusion at feature map peripheries, severely impacting detection or segmentation near borders. The DoF-based curriculum substantially enhances both convergence speed and stability.
Primary limitations include reliance on pseudo-ground-truth (filtered estimates from a classical method), with a lack of real labeled rectangling data. In scenes with highly complex or non-convex borders, the method can still yield slight spatial stretching in the content. Potential future work includes on-the-fly RecRecNet augmentation during detector training to better close the domain gap, and extension to generalized deformations such as rolling shutter correction or image stitching (Liao et al., 2023).
Reference:
"RecRecNet: Rectangling Rectified Wide-Angle Images by Thin-Plate Spline Model and DoF-based Curriculum Learning" (Liao et al., 2023).