Topology-Constrained 2D-DTW Algorithm

Updated 6 December 2025

The paper introduces a topology-constrained 2D-DTW that preserves grid integrity by enforcing a monotonic column mapping via dynamic programming.
It employs column-wise 1D-DTW to compute dissimilarity measures, yielding robust correspondences despite perspective distortions and occlusions.
Experimental validation shows improved matching accuracy, lower 3D reconstruction error (≈1 mm), and real-time performance on mobile platforms.

Topology-constrained two-dimensional dynamic time warping (2D-DTW) is an algorithmic framework designed for robust matching between a structured, ideal 2D grid and its observed, spatially deformed version—particularly under nontrivial conditions encountered during structured-light terrain sensing. Distinguished by a global monotonic consistency constraint, 2D-DTW preserves the topological integrity of the grid while aligning columns using dynamic programming. The methodology yields accurate correspondences even with perspective distortion and partial occlusion, enabling resource-efficient 3D reconstruction from smartphone-based projection systems (Nobuaki, 29 Nov 2025).

1. Formal Structure and Problem Statement

The central objective is to align two discrete surfaces: $A\in\mathbb{R}^{p\times q}$ , representing the ideal grid (e.g., projected by a smartphone), and $B\in\mathbb{R}^{r\times s}$ , representing the detected, possibly distorted, grid captured in the camera image. Each column $i$ of $A$ is profiled as $A_i=[A_{1,i},...,A_{p,i}]^T$ , and each column $j$ of $B$ as $B_j=[B_{1,j},...,B_{r,j}]^T$ . The mapping seeks to assign columns of $A$ to those of $B$ such that triangulation from these correspondences maintains rectilinear grid connectivity and yields consistent 3D geometry.

This column-centric formulation is justified by the axis-aligned nature of the projected grid (with the UI “north-up”). The procedure thus emphasizes warping along the column dimension within a monotonic mapping framework, preventing nonphysical foldovers or crossings.

2. Cost Function and Dynamic Programming Recurrence

The alignment process comprises two main stages:

Step 1: Column-wise 1D-DTW Computation

Each pair $(i,j)$ is compared via 1D-DTW between the profiles $C^A_i$ and $C^B_j$ , yielding a dissimilarity measure:

$d(C^A_i, C^B_j) = \min_w \sum_{(k,\ell)\in w} \|A_i[k] - B_j[\ell]\|^2$

where $w$ denotes a valid warping path subject to boundary, monotonicity, and step-size constraints. These pairwise distances populate the matrix $D\in\mathbb{R}^{q\times s}$ .

Step 2: Extraction of a Globally Consistent Path

Dynamic programming is applied to $D$ , accumulating costs in $F\in\mathbb{R}^{q\times s}$ :

$F_{i,j} = D_{i,j} + \min\{F_{i-1,j}, F_{i,j-1}, F_{i-1,j-1}\}$

Initialization:

$F_{1,1} = D_{1,1}, \quad F_{i,1}=D_{i,1}+F_{i-1,1}, \quad F_{1,j}=D_{1,j}+F_{1,j-1}$

The optimal correspondence path $w^*$ is recovered by tracing the minimum-cost path from $F_{q,:}$ backward to the start, subject to allowed moves $(1,0), (0,1), (1,1)$ in the $(i,j)$ space.

3. Enforcement of Topological Consistency

The dynamic programming constraints—only permitting right, down, or down-right steps—ensure monotonic progression in both the display grid index $i$ and the observed grid index $j$ . This one-to-one order-preserving mapping inherently maintains grid connectivity without additional penalty functions, as violations such as crossings or foldbacks become infeasible by construction. A plausible implication is that this preserves the rectangular structure essential for structured-light triangulation.

4. Robustness to Perspective Distortion and Occlusion

Perspective distortion introduces nonuniform row spacing among grid intersections, while partial occlusion can eliminate entire detected intersections. The column-profile 1D-DTW calculation is robust to such nonuniformity—warping along the row indices matches salient structural features regardless of scale variation. In cases of occlusion or missing data, DTW “skips” indices (utilizing allowed step transitions), inflating local costs but not requiring custom penalties. Global river path extraction strategically avoids high-cost pairings, naturally circumventing severely occluded regions.

5. Computational Complexity and Resource Efficiency

The runtime for computing the $D$ matrix scales as $O(qs\cdot pr)$ , which, for $p\approx r\approx q\approx s\approx N$ , yields $O(N^4)$ overall. Dynamic programming over $F$ introduces an additional $O(N^2)$ cost. Memory requirements are bounded by storage of $D$ and $F$ ( $O(N^2)$ each); no 4D DP array is constructed, maintaining practical memory consumption for $N$ in the 20–30 range typical for smartphone grids.

A lightweight greedy alternative reduces complexity further by tracing local minima paths in $D$ , but this sacrifices global alignment optimality.

6. Algorithmic Workflow

The following outlines the full procedure:

Capture camera image and detect grid intersections via LoG filtering, skeletonization, and intersection detection.
Organize detected intersections into column profiles $B_j$ ; profiles $A_i$ are predetermined by grid geometry.
Compute $d(C^A_i, C^B_j)$ for each $(i,j)$ , populating $D$ .
Execute dynamic programming over $D$ to fill $F$ and backtrack for the optimal correspondence path $w^*$ .
Derive a continuous column mapping $a(i) = \text{mean}$ of $j$ -values along $w^*$ .
Form correspondences for each grid intersection $(i,k)$ to its column-matched observed counterpart.
Apply triangulation to matched pairs to recover the 3D ground-plane position.

7. Experimental Performance Validation

Evaluation across three terrain types—high-texture random-dot, medium-texture tile/wood, and low-texture vinyl—demonstrated the following:

Superior intersection matching success rates on low-texture floors compared to ORB+RANSAC stereo and nearest-neighbor matching.
Lower height-reconstruction RMSE (≈1 mm) versus feature-based triangulation (≈3 mm) for medium/low-texture scenes.
Achieved real-time inference (~50 ms/frame) on Android hardware, contrasted with multi-second runtimes for exhaustive non-topological 2D-DTW and bundle adjustment.

These results confirm that exploiting grid topology and enforcing monotonic constraints in 2D-DTW delivers robustness to perspective distortion and occlusion while retaining computational efficiency suitable for mobile, resource-constrained platforms (Nobuaki, 29 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

Terrain Sensing with Smartphone Structured Light: 2D Dynamic Time Warping for Grid Pattern Matching (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Topology-Constrained Two-Dimensional Dynamic Time Warping (2D-DTW).