PanoTPS-Net: 3D Indoor Layout Estimation

Updated 15 June 2026

The paper introduces a CNN-based approach that formulates indoor room layout prediction as a learnable image-warping problem using a differentiable Thin Plate Spline transformation.
It employs a modified Xception backbone with a TPS spatial transformation layer to capture complex, non-cuboid geometries through smooth deformation of a reference layout.
The method achieves state-of-the-art performance in metrics like 3DIoU and 2DIoU, demonstrating robust handling of both cuboid and irregular room structures.

PanoTPS-Net is a convolutional neural network (CNN) architecture for estimating the 3D layout of indoor rooms from a single 360° equirectangular panorama image via a differentiable Thin Plate Spline (TPS) transformation. The model formulates room layout prediction as a learnable image-warping problem, enabling robust generalization to both cuboid and non-cuboid room types. By leveraging the smoothness and flexibility of TPS, PanoTPS-Net bridges the gap between simple handcrafted reference layouts and the diverse structural complexity found in real-world environments (Ibrahem et al., 13 Oct 2025).

1. Problem Formulation and Motivation

The fundamental challenge addressed by PanoTPS-Net is the estimation of complete 3D room structure—including walls, floor, ceiling boundaries, and corner positions—from a single panoramic RGB image. Traditional approaches generally fall into two categories: edge (boundary) map prediction or direct regression of corner coordinates. These methods either impose restrictive Manhattan-world (rectilinear, cuboid) assumptions or struggle to generalize beyond basic geometry, leading to reduced accuracy in non-cuboid or irregular rooms.

The principal innovation of PanoTPS-Net is the framing of layout estimation as a spatial warping task. Starting from a simple reference layout (such as a canonical cuboid edge/corner map), the model predicts a TPS transformation that smoothly deforms the reference to match the target room shape in the panorama. TPS is selected for its capacity to satisfy precise control-point alignment while maintaining global smoothness by minimizing bending energy:

$E[U] = \sum_i\|U(x_i, y_i) - (x'_i, y'_i)\|^2 + \lambda \iint \left[ (U_{xx})^2 + 2(U_{xy})^2 + (U_{yy})^2 \right] dx \, dy$

This property enables the network to capture complex structural variations without the over-flexibility or instability of unconstrained transformation schemes.

2. Network Architecture

PanoTPS-Net employs a two-stage process comprising:

CNN Feature Extractor: The model ingests a resized (1024×512) RGB panorama and computes latent feature embeddings using a modified Xception backbone ("MXception"), characterized by depth-wise separable convolutions for computational efficiency. Post-convolution, global average pooling reduces the spatial output to a feature vector, with the final fully connected layer regressing the parameters of the TPS transformation:
- Nonlinear control-point offsets $B \in \mathbb{R}^{N \times 2}$ , for $N$ control points.
- The linear affine part $A \in \mathbb{R}^{2 \times 3}$ , typically initialized as the identity.
TPS Spatial Transformation Layer: The predicted TPS parameters are applied to a regular source grid of control points $\{(x_i, y_i)\}_{i=1}^N$ over the reference map. The TPS deformation for a query coordinate $(x, y)$ is given by:

$T(x, y) = A \begin{pmatrix} 1 \ x \ y \end{pmatrix} + \sum_{i=1}^N b_i K(\| (x, y)-(x_i, y_i) \|)$

with the standard TPS kernel $K(r) = r^2 \log r^2$ and $b_i$ learned per control point. This layered design enables end-to-end differentiable image warping, integral to training via backpropagation.

3. Learning Objective and Loss Functions

The model outputs two warped predictions:

A reference edge map $\hat{E}$ (with RGB channels encoding semantic boundaries: wall-wall, wall-ceiling, wall-floor),
A one-channel reference corner map $B \in \mathbb{R}^{N \times 2}$ 0 (as a corner-location heatmap).

A pixelwise Huber loss ( $B \in \mathbb{R}^{N \times 2}$ 1) penalizes deviations between predictions and ground truth:

$B \in \mathbb{R}^{N \times 2}$ 2

The aggregate loss is a weighted sum:

$B \in \mathbb{R}^{N \times 2}$ 3

with best performance at $B \in \mathbb{R}^{N \times 2}$ 4. This dual-output formulation enforces both fine-grained boundary alignment and precise corner localization.

4. Training Procedure and Datasets

PanoTPS-Net is trained on a variety of public datasets encompassing both cuboid and non-cuboid layouts:

Dataset	Panorama Count	Layout Type	Usage
PanoContext (PC)	500	Cuboid	Train/Test
Stanford-2D3D (S2D3D)	571	Cuboid	Train/Test
Matterport3DLayout (MP3D)	2295	Mixed	Train/Val/Test
Zillow Indoor (ZInD)	~31,000	Non-cuboid prevalent	Test

Panoramas are resized to 1024×512 and normalized with ImageNet statistics. No geometric augmentation is performed aside from random horizontal flips. In non-cuboid settings, a corner map post-processing step splits merged corner blobs using a 75px width threshold for accurate localization.

Optimization is performed in TensorFlow/Keras with Adam (learning rate 1e-4, weight decay 1e-6, batch size 8), using up to 500 epochs with early stopping. Pretrained Xception weights initialize the MXception backbone.

5. Evaluation Metrics and Comparative Performance

Performance is assessed via multiple criteria:

3D Intersection-over-Union (3DIoU): Measures volumetric overlap of predicted versus ground-truth cuboid layouts.
2D IoU: Evaluates planar overlap for more general, non-cuboid footprints.
Corner Error (CE): Pixelwise distance between predicted and true corner positions.
Pixel Error (PE): Per-pixel edge map difference.

PanoTPS-Net achieves competitive or superior results compared to previous approaches:

Dataset	3DIoU (%)	2DIoU (%)	Prior best (3DIoU/2DIoU)
PanoContext	85.49	–	∼85.02
Stanford-2D3D	86.18	–	∼86.60
Matterport3DLayout	81.76	84.15	∼81.70 / 84.11
Zillow Indoor (ZInD)	91.98	90.05	∼91.94 / 90.13

These outcomes underscore the compatibility of TPS with panoramic input and its ability to handle complex indoor geometry.

6. Qualitative Analysis

Visualization of TPS control-point deformation (cf. Figure 1 in the primary source) reveals that source grid points (yellow dots) are smoothly mapped to targets (orange), permitting substantial but controlled shape adaptation. Sample outputs (Figures 3 and 3-1) demonstrate that reference cuboid maps can be deformed to T-shaped, L-shaped, and other multi-corner configurations.

Bird's-eye and 3D reconstructions (Figures 5 and 6) indicate superior geometric fidelity for non-Manhattan rooms compared to methods such as LED2Net, LGT-Net, and DOPNet, which often impose strong rectilinear biases or miss irregular structures.

7. Ablation Studies and Analysis of TPS Role

A sequence of controlled experiments isolates key architectural and design choices:

Backbone Selection: Off-the-shelf networks (ResNet50, InceptionV3, EfficientNet, ConvNeXt) either failed to converge or underperformed (3DIoU 30–75%). The MXception backbone reached 85.49%.
Warping Outputs: Warping only edge or corner maps led to reduced accuracy or convergence failure. Warping both yielded best results (3DIoU 85.5% on PC, 81.8% on MP3D).
Loss Weights: Emphasizing the edge loss ( $B \in \mathbb{R}^{N \times 2}$ 5) was essential; corner-only supervision was too sparse.
TPS Control Points: Optimal flexibility was achieved with a moderate control-point count (16 for simple layouts, 64 for complex). Too few points led to poor fit; too many induced over-flexibility and artifacts in cuboid rooms.
Corner Post-processing: A threshold of 75 px for splitting merged corner blobs matched ground-truth counts most effectively.

These findings corroborate the importance of TPS-based spatial transformation and joint edge/corner warping for stable and generalizable layout estimation across diverse room geometries (Ibrahem et al., 13 Oct 2025).

Markdown Report Issue Upgrade to Chat

References (1)

PanoTPS-Net: Panoramic Room Layout Estimation via Thin Plate Spline Transformation (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PanoTPS-Net.