Smartphone-Based Structured-Light System

Updated 6 December 2025

The paper introduces a smartphone-based system that projects a structured light grid to reconstruct precise 3D terrain profiles.
It employs a topology-constrained 2D dynamic time warping algorithm to reliably match ideal and observed grid patterns amid perspective distortions.
Experimental results demonstrate over 95% valid 3D reconstructions at 8–12 fps on mid-range Android devices, emphasizing its real-time performance.

A smartphone-based structured-light system utilizes a consumer smartphone’s display as a structured-light projector and its camera as the sensing component to perceive local terrain unevenness. This configuration projects a high-contrast grid pattern onto the ground, then analyzes the pattern’s deformation in the captured camera image to reconstruct the three-dimensional profile of the terrain. The technical framework centers on robust matching of ideal and observed grid patterns through a topology-constrained two-dimensional dynamic time warping (2D-DTW) technique, enabling accurate 3D triangulation on resource-limited hardware. The system is designed for near-field ground sensing, especially in contexts where subtle terrain variations critically affect locomotion stability, as in mobile rovers (Nobuaki, 29 Nov 2025).

1. Hardware and Optical Configuration

The display module projects a rectangular grid, characterized by physical dimensions $(W_d \times H_d)$ (mm) and resolution $(U_d \times V_d)$ (pixels). Grid-line spacing is uniform, translating display pixels spacings $(\Delta U_d, \Delta V_d)$ into world spacings $(\Delta X, \Delta Y)$ via $\Delta X = \Delta U_d (W_d/U_d)$ , $\Delta Y = \Delta V_d (H_d/V_d)$ . The front-facing camera, typically with $U_c \times V_c$ resolution (e.g., 1920×1080), and focal length $f \simeq 1000$ px, observes the ground through a custom narrow field-of-view film (SN-VCF), which restricts the angular response to approximately $\pm 10^\circ$ about the optical axis. The projector (display) plane is maintained parallel to the ground plane at a fixed baseline height $h = 30$ mm, and any lateral offset due to mounting is absorbed into extrinsic calibration.

Calibration consists of two main stages: intrinsic camera calibration (estimating $K$ and lens distortion via checkerboard patterns) and extrinsic calibration between display and camera. Display pixel coordinates $(u_d,v_d)$ are modeled as virtual sources at $S(u_d,v_d) = (X_d(u_d), Y_d(v_d), h)^T$ , and the projector–camera transform $P = K[R|t]$ —where $(R, t)$ denote the rigid transformation—are computed via Lie-algebra maximum-likelihood optimization using ground control points and their image projections.

2. Projected Grid Pattern Design

The projected pattern is a bright rectangular lattice on a dark background, with grid columns and rows spaced by $\Delta U_d$ and $\Delta V_d$ . In practice, $\Delta U_d = \Delta V_d$ yields a square lattice where projected cell sizes are approximately 5–10 mm at the nominal baseline. Orientation of the grid is locked to “north-up” using the onboard compass and IMU, maintaining world alignment for rover navigation. Uniform grid spacing and locked axes facilitate detection under perspective distortion, as vertical lines remain co-planar and parallel. SN-VCF narrows the FOV, enhancing grid stability and keeping line width nearly constant even for viewing angles up to $\pm 10^\circ$ .

3. Image Processing and Grid Intersection Detection

Images are acquired at about 10 fps, each frame immediately undistorted based on calibrated lens parameters. Preprocessing entails:

Gaussian blurring, tuned to SN-VCF response, attenuates high-frequency sensor noise.
Laplacian-of-Gaussian enhancement accentuates grid-line profiles.
Global thresholding on LoG responses yields binary segmentation, followed by skeletonization into one-pixel-wide grid representations.
Intersection detection, via connected-component analysis, localizes grid crossings as $(u,v,1)^T$ pixel coordinates.

Each step is optimized for real-time throughput on mainstream mobile hardware while retaining precision.

4. Topology-Constrained 2D Dynamic Time Warping (2D-DTW)

The central algorithmic innovation is a topology-constrained 2D-DTW that extends conventional 1D-DTW to robustly match two-dimensional grid patterns amid perspective distortion and partial occlusion.

The cost matrix $D_{i,j} = \text{DTW}(A_i, B_j)$ evaluates the minimum pathwise difference between column profiles $A_i$ (ideal grid) and $B_j$ (observed intersections), subject to boundary, monotonicity, and step constraints.
The global grid alignment seeks an optimal river path $w^* = \arg\min_w C(w)$ where $C(w) = \sum_t D_{i_t, j_t}$ , solved via dynamic programming recurrence:

$F_{i,j} = D_{i,j} + \min \{ F_{i-1,j}, F_{i,j-1}, F_{i-1,j-1} \}$

Free-endpoint matching permits partial column correspondences if the grid extends beyond the camera’s FOV.
2D consistency is enforced by repeating alignment row-wise. A greedy variant traces local minima for faster, approximate matching.

Computational complexity for $N$ -line grids is $O(N^4)$ for the full column-column cost matrix and $O(N^2)$ for river-path extraction, with memory $O(N^2)$ for key matrices.

5. Triangulation and 3D Reconstruction

Matched intersections provide correspondences between display-plane points $S=(X_d,Y_d,h)^T$ and camera image pixels $u=(u,v,1)^T$ . Back-projection through $P^{-1}$ defines camera rays $X_{\text{cam}}(\lambda) = O_{\text{cam}} + \lambda K^{-1}u$ , which intersect the ground plane $Z=0$ to yield $(X,Y,0)$ . In stereo disparity approximation, if disparity $d = u_x - u_d$ and baseline $B = h$ , then the depth at any column is $Z = f B / d$ .

Handling missing or occluded intersections employs large penalties in the DTW cost matrix and free-endpoint strategies. Reconstruction holes are filled via local grid interpolation prior to fusion into the overall ground map.

6. Performance Assessment and Applications

Experimental results demonstrate root-mean-square error and median absolute error for height estimates against reference blocks (10 mm, 20 mm, etc.) across multiple floor textures, with success rates of valid 3D point reconstructions exceeding 95% on low-texture floors (by contrast, feature-based stereo achieves less than 60%). The image processing and 2D-DTW algorithm run at 8–12 fps (2D-DTW: $\sim$ 30 ms/frame; preprocessing: $\sim$ 50 ms), yielding approximately 100 ms overall latency on mid-range Android devices.

The grid-matching framework extends to projector–camera calibration, AR marker tracking, robotic docking, and document-camera alignment with ruled backgrounds. The 2D-DTW paradigm is applicable to any structured grid pattern matching where mild perspective distortion occurs.

7. Limitations and Prospective Developments

Restrictions include orientation control via north-up grid locking (large tilts $> \pm 10^\circ$ degrade detection), planarity assumption for ground (steep slopes or steps break model validity), and range (reliable estimation limited to near-field $<100$ mm by the baseline $h=30$ mm).

Potential improvements comprise rotation-invariant 2D-DTW via polar or affine re-sampling, adoption of pseudo-random dot projector patterns to increase density and disambiguate periodic matches, and tighter fusion with IMU/odometry for visual–inertial SLAM to enable robust mapping of non-planar terrains.

This suggests ongoing efforts to expand real-world applicability and precision amid broader robotics and computer vision tasks where low-cost, mobile structured-light perception is valuable (Nobuaki, 29 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

Terrain Sensing with Smartphone Structured Light: 2D Dynamic Time Warping for Grid Pattern Matching (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Smartphone-Based Structured-Light System.