Topology-Constrained 2D-DTW Algorithm
- The paper introduces a topology-constrained 2D-DTW that preserves grid integrity by enforcing a monotonic column mapping via dynamic programming.
- It employs column-wise 1D-DTW to compute dissimilarity measures, yielding robust correspondences despite perspective distortions and occlusions.
- Experimental validation shows improved matching accuracy, lower 3D reconstruction error (≈1 mm), and real-time performance on mobile platforms.
Topology-constrained two-dimensional dynamic time warping (2D-DTW) is an algorithmic framework designed for robust matching between a structured, ideal 2D grid and its observed, spatially deformed version—particularly under nontrivial conditions encountered during structured-light terrain sensing. Distinguished by a global monotonic consistency constraint, 2D-DTW preserves the topological integrity of the grid while aligning columns using dynamic programming. The methodology yields accurate correspondences even with perspective distortion and partial occlusion, enabling resource-efficient 3D reconstruction from smartphone-based projection systems (Nobuaki, 29 Nov 2025).
1. Formal Structure and Problem Statement
The central objective is to align two discrete surfaces: , representing the ideal grid (e.g., projected by a smartphone), and , representing the detected, possibly distorted, grid captured in the camera image. Each column of is profiled as , and each column of as . The mapping seeks to assign columns of to those of such that triangulation from these correspondences maintains rectilinear grid connectivity and yields consistent 3D geometry.
This column-centric formulation is justified by the axis-aligned nature of the projected grid (with the UI “north-up”). The procedure thus emphasizes warping along the column dimension within a monotonic mapping framework, preventing nonphysical foldovers or crossings.
2. Cost Function and Dynamic Programming Recurrence
The alignment process comprises two main stages:
Step 1: Column-wise 1D-DTW Computation
Each pair is compared via 1D-DTW between the profiles and , yielding a dissimilarity measure:
where denotes a valid warping path subject to boundary, monotonicity, and step-size constraints. These pairwise distances populate the matrix .
Step 2: Extraction of a Globally Consistent Path
Dynamic programming is applied to , accumulating costs in :
Initialization:
The optimal correspondence path is recovered by tracing the minimum-cost path from backward to the start, subject to allowed moves in the space.
3. Enforcement of Topological Consistency
The dynamic programming constraints—only permitting right, down, or down-right steps—ensure monotonic progression in both the display grid index and the observed grid index . This one-to-one order-preserving mapping inherently maintains grid connectivity without additional penalty functions, as violations such as crossings or foldbacks become infeasible by construction. A plausible implication is that this preserves the rectangular structure essential for structured-light triangulation.
4. Robustness to Perspective Distortion and Occlusion
Perspective distortion introduces nonuniform row spacing among grid intersections, while partial occlusion can eliminate entire detected intersections. The column-profile 1D-DTW calculation is robust to such nonuniformity—warping along the row indices matches salient structural features regardless of scale variation. In cases of occlusion or missing data, DTW “skips” indices (utilizing allowed step transitions), inflating local costs but not requiring custom penalties. Global river path extraction strategically avoids high-cost pairings, naturally circumventing severely occluded regions.
5. Computational Complexity and Resource Efficiency
The runtime for computing the matrix scales as , which, for , yields overall. Dynamic programming over introduces an additional cost. Memory requirements are bounded by storage of and ( each); no 4D DP array is constructed, maintaining practical memory consumption for in the 20–30 range typical for smartphone grids.
A lightweight greedy alternative reduces complexity further by tracing local minima paths in , but this sacrifices global alignment optimality.
6. Algorithmic Workflow
The following outlines the full procedure:
- Capture camera image and detect grid intersections via LoG filtering, skeletonization, and intersection detection.
- Organize detected intersections into column profiles ; profiles are predetermined by grid geometry.
- Compute for each , populating .
- Execute dynamic programming over to fill and backtrack for the optimal correspondence path .
- Derive a continuous column mapping of -values along .
- Form correspondences for each grid intersection to its column-matched observed counterpart.
- Apply triangulation to matched pairs to recover the 3D ground-plane position.
7. Experimental Performance Validation
Evaluation across three terrain types—high-texture random-dot, medium-texture tile/wood, and low-texture vinyl—demonstrated the following:
- Superior intersection matching success rates on low-texture floors compared to ORB+RANSAC stereo and nearest-neighbor matching.
- Lower height-reconstruction RMSE (≈1 mm) versus feature-based triangulation (≈3 mm) for medium/low-texture scenes.
- Achieved real-time inference (~50 ms/frame) on Android hardware, contrasted with multi-second runtimes for exhaustive non-topological 2D-DTW and bundle adjustment.
These results confirm that exploiting grid topology and enforcing monotonic constraints in 2D-DTW delivers robustness to perspective distortion and occlusion while retaining computational efficiency suitable for mobile, resource-constrained platforms (Nobuaki, 29 Nov 2025).