Smartphone-Based Structured-Light System
- The paper introduces a smartphone-based system that projects a structured light grid to reconstruct precise 3D terrain profiles.
- It employs a topology-constrained 2D dynamic time warping algorithm to reliably match ideal and observed grid patterns amid perspective distortions.
- Experimental results demonstrate over 95% valid 3D reconstructions at 8–12 fps on mid-range Android devices, emphasizing its real-time performance.
A smartphone-based structured-light system utilizes a consumer smartphone’s display as a structured-light projector and its camera as the sensing component to perceive local terrain unevenness. This configuration projects a high-contrast grid pattern onto the ground, then analyzes the pattern’s deformation in the captured camera image to reconstruct the three-dimensional profile of the terrain. The technical framework centers on robust matching of ideal and observed grid patterns through a topology-constrained two-dimensional dynamic time warping (2D-DTW) technique, enabling accurate 3D triangulation on resource-limited hardware. The system is designed for near-field ground sensing, especially in contexts where subtle terrain variations critically affect locomotion stability, as in mobile rovers (Nobuaki, 29 Nov 2025).
1. Hardware and Optical Configuration
The display module projects a rectangular grid, characterized by physical dimensions (mm) and resolution (pixels). Grid-line spacing is uniform, translating display pixels spacings into world spacings via , . The front-facing camera, typically with resolution (e.g., 1920×1080), and focal length px, observes the ground through a custom narrow field-of-view film (SN-VCF), which restricts the angular response to approximately about the optical axis. The projector (display) plane is maintained parallel to the ground plane at a fixed baseline height mm, and any lateral offset due to mounting is absorbed into extrinsic calibration.
Calibration consists of two main stages: intrinsic camera calibration (estimating and lens distortion via checkerboard patterns) and extrinsic calibration between display and camera. Display pixel coordinates are modeled as virtual sources at , and the projector–camera transform —where denote the rigid transformation—are computed via Lie-algebra maximum-likelihood optimization using ground control points and their image projections.
2. Projected Grid Pattern Design
The projected pattern is a bright rectangular lattice on a dark background, with grid columns and rows spaced by and . In practice, yields a square lattice where projected cell sizes are approximately 5–10 mm at the nominal baseline. Orientation of the grid is locked to “north-up” using the onboard compass and IMU, maintaining world alignment for rover navigation. Uniform grid spacing and locked axes facilitate detection under perspective distortion, as vertical lines remain co-planar and parallel. SN-VCF narrows the FOV, enhancing grid stability and keeping line width nearly constant even for viewing angles up to .
3. Image Processing and Grid Intersection Detection
Images are acquired at about 10 fps, each frame immediately undistorted based on calibrated lens parameters. Preprocessing entails:
- Gaussian blurring, tuned to SN-VCF response, attenuates high-frequency sensor noise.
- Laplacian-of-Gaussian enhancement accentuates grid-line profiles.
- Global thresholding on LoG responses yields binary segmentation, followed by skeletonization into one-pixel-wide grid representations.
- Intersection detection, via connected-component analysis, localizes grid crossings as pixel coordinates.
Each step is optimized for real-time throughput on mainstream mobile hardware while retaining precision.
4. Topology-Constrained 2D Dynamic Time Warping (2D-DTW)
The central algorithmic innovation is a topology-constrained 2D-DTW that extends conventional 1D-DTW to robustly match two-dimensional grid patterns amid perspective distortion and partial occlusion.
- The cost matrix evaluates the minimum pathwise difference between column profiles (ideal grid) and (observed intersections), subject to boundary, monotonicity, and step constraints.
- The global grid alignment seeks an optimal river path where , solved via dynamic programming recurrence:
- Free-endpoint matching permits partial column correspondences if the grid extends beyond the camera’s FOV.
- 2D consistency is enforced by repeating alignment row-wise. A greedy variant traces local minima for faster, approximate matching.
Computational complexity for -line grids is for the full column-column cost matrix and for river-path extraction, with memory for key matrices.
5. Triangulation and 3D Reconstruction
Matched intersections provide correspondences between display-plane points and camera image pixels . Back-projection through defines camera rays , which intersect the ground plane to yield . In stereo disparity approximation, if disparity and baseline , then the depth at any column is .
Handling missing or occluded intersections employs large penalties in the DTW cost matrix and free-endpoint strategies. Reconstruction holes are filled via local grid interpolation prior to fusion into the overall ground map.
6. Performance Assessment and Applications
Experimental results demonstrate root-mean-square error and median absolute error for height estimates against reference blocks (10 mm, 20 mm, etc.) across multiple floor textures, with success rates of valid 3D point reconstructions exceeding 95% on low-texture floors (by contrast, feature-based stereo achieves less than 60%). The image processing and 2D-DTW algorithm run at 8–12 fps (2D-DTW: 30 ms/frame; preprocessing: 50 ms), yielding approximately 100 ms overall latency on mid-range Android devices.
The grid-matching framework extends to projector–camera calibration, AR marker tracking, robotic docking, and document-camera alignment with ruled backgrounds. The 2D-DTW paradigm is applicable to any structured grid pattern matching where mild perspective distortion occurs.
7. Limitations and Prospective Developments
Restrictions include orientation control via north-up grid locking (large tilts degrade detection), planarity assumption for ground (steep slopes or steps break model validity), and range (reliable estimation limited to near-field mm by the baseline mm).
Potential improvements comprise rotation-invariant 2D-DTW via polar or affine re-sampling, adoption of pseudo-random dot projector patterns to increase density and disambiguate periodic matches, and tighter fusion with IMU/odometry for visual–inertial SLAM to enable robust mapping of non-planar terrains.
This suggests ongoing efforts to expand real-world applicability and precision amid broader robotics and computer vision tasks where low-cost, mobile structured-light perception is valuable (Nobuaki, 29 Nov 2025).