Papers
Topics
Authors
Recent
Search
2000 character limit reached

Glyph2Cloud: Affine-Invariant Gesture Recognition

Updated 6 December 2025
  • Glyph2Cloud module is a system that transforms user-drawn glyphs into fixed-length point clouds for efficient, robust gesture recognition.
  • It employs the Squiggle algorithm with triangle-based affine alignment to ensure invariance to rotation, scale, skew, and reflection.
  • The module processes glyphs through steps like path regularization, affine mapping, and error metric evaluation to deliver high accuracy and sub-millisecond performance.

The Glyph2Cloud module provides a systematic approach for transforming user-drawn glyphs into regularized, fixed-length point clouds suitable for robust, affine-invariant gesture recognition. Grounded in the Squiggle algorithm, it supports recognition invariant to rotation, scale, skew, and reflection, and is optimized for real-time feedback with sub-millisecond latency on modern hardware. The key pipeline comprises milestone point extraction, triangle-based affine alignment, candidate selection, affine mapping, error metric computation, and a rigorous recognition loop. Affine transformations for template matching, robust filtering of degenerate cases, and reflection symmetries are precisely handled to enable high accuracy and efficient computation in gesture-based interfaces (Lee, 2011).

1. Milestone Point Cloud Construction

Raw input glyphs, typically sequences of pen or touch points P=[p0,p1,...,pk]P = [p_0, p_1, ..., p_k], are first regularized to remove positional jitter and enforce consistent inter-sample distance. The procedure involves:

  • Path Regularization: Generate a new polyline RR from PP such that points are spaced approximately δ\delta pixels apart, using interpolation whenever the accumulated segment length exceeds δ\delta. This is realized by a stepwise traversal and point insertion at length thresholds (referred to as path_regularize(P, \delta)).
  • Down-sampling to Milestone Points: From the regularized path RR, down-sample to exactly nn milestone points (typically n=16n=16), securing uniform arc-length spacing. Arc-length parameterization i/(n1)i/(n-1) (i=0...n1i=0...n-1) selects sample locations, with linear interpolation bridging intermediate positions (path_interpolate(R, n)).
  • Cloud Representation: The resulting g=[g0,g1,...,gn1],  giR2g = [g_0, g_1, ..., g_{n-1}],\; g_i \in \mathbb{R}^2, encodes the glyph for further processing.

2. Triangle-Based Affine Alignment Framework

Affine alignment between input and template glyphs leverages triangles defined by ordered indices:

  • Path Length Calculation: For any cloud pp, total path length λ(p)=ipi+1pi\lambda(p) = \sum_{i} \|p_{i+1} - p_i\|.
  • Triangle Edge Matrix: For each index triplet 0a<b<c<n0 \leq a < b < c < n, form M(p)abc=[pbpa,  pcpa]M(p)_{abc} = [p_b - p_a,\; p_c - p_a], a 2×22 \times 2 edge matrix.
  • Normalized Determinant Matrix: Each triangle’s signed, normalized area D(p)abcD(p)_{abc}, defined as

D(p)abc=4det([pb.xpa.xpc.xpa.x pb.ypa.ypc.ypa.y])λ(p)2D(p)_{abc} = \frac{4\, \det\left(\begin{bmatrix} p_b.x - p_a.x & p_c.x - p_a.x \ p_b.y - p_a.y & p_c.y - p_a.y \end{bmatrix}\right)}{\lambda(p)^2}

encodes scale-invariant geometric features.

  • Degeneracy and Glyph Dimensionality: maxabcGabc<ε1\max_{abc} |G_{abc}| < \varepsilon_1 (with ε10.004\varepsilon_1 \approx 0.004) designates the glyph as essentially 1-D ("line glyph"); values above indicate 2-D structure.

3. Candidate Triangle Selection and Robust Alignment

To optimize alignment quality and computational efficiency:

  • Triangle Robustness Filtering: Among all possible index triplets, select mm triangles (commonly m=10m=10) with the highest Gabc|G_{abc}| values. This maximizes area robustness and minimizes degeneracies (pivot-select prioritizes efficiency over full sorting).
  • Candidate Alignment Generation: Each chosen triangle [a,b,c][a,b,c] serves as a frame for affine transformation estimation between point clouds.

4. Affine Map Construction and Metric Evaluation

Affine transformations are constructed from triangle correspondences:

  • Transformation Matrix Construction: Using homogeneous coordinates, the 3×3 matrix for points [a,b,c][a,b,c] in cloud pp is:

p^abc=[pb.xpa.xpc.xpa.xpa.x pb.ypa.ypc.ypa.ypa.y 001]\hat{p}_{abc} = \begin{bmatrix} p_b.x - p_a.x & p_c.x - p_a.x & p_a.x \ p_b.y - p_a.y & p_c.y - p_a.y & p_a.y \ 0 & 0 & 1 \end{bmatrix}

  • Map Application: For input gg and template hh, and triangle [a,b,c][a,b,c]:

Tabc=(h^abc)1g^abcT_{abc} = (\hat{h}_{abc})^{-1} \cdot \hat{g}_{abc}

Then, each template point hih_i is projected as ri=Tabc[hi.x;  hi.y;  1]r_i = T_{abc} \cdot [h_i.x;\; h_i.y;\; 1].

  • Error Metric: The sum-of-squared-error (without square-root) is computed:

metric(g,r)=i=0n1[(gi.xri.x)2+(gi.yri.y)2]\text{metric}(g, r) = \sum_{i=0}^{n-1} \left[(g_i.x - r_i.x)^2 + (g_i.y - r_i.y)^2\right]

preserving ordering and computational efficiency.

5. Recognition Pipeline and Invariance Properties

The recognition process iterates over template glyphs and candidate triangles, enforcing invariance and optimizing match quality:

  • Recognition Loop Core: For each candidate triangle and template:
    • Dimensionality Consistency: Skip if input and template differ in 1-D/2-D classification.
    • Degeneracy Checks: Discard nearly degenerate triangles (ndg<ε2|nd_g| < \varepsilon_2 or ndh<ε2|nd_h| < \varepsilon_2).
    • Reflection Control: If sign(ndgndh)<0\text{sign}(nd_g \cdot nd_h) < 0 and template prohibits mirroring, skip.
    • Orientation Constraints: Optionally restrict on orientation similarity using

    triSimilarity(g,h,[a,b,c])=cosθgagb,hahb+cosθgbgc,hbhc+cosθgcga,hcha\text{triSimilarity}(g, h, [a,b,c]) = \cos\theta_{g_a g_b, h_a h_b} + \cos\theta_{g_b g_c, h_b h_c} + \cos\theta_{g_c g_a, h_c h_a} - Affine Mapping, Projection, Metric: Construct affine map, transform template, compute metric, and retain the template yielding lowest error.

  • Invariance Guarantees:

    • Translation: pap_a anchors the transformation.
    • Rotation, Scaling, Skew: Encoded in [pbpa,pcpa][p_b-p_a, p_c-p_a].
    • Reflection: Determinant sign changes under mirroring, with explicit logic to allow or bar mirrored matches.

6. Computational Properties and Empirical Results

The computational and empirical performance of the Glyph2Cloud pipeline is characterized by:

  • Complexity: For n=16n=16, T=32T=32, m=10m=10:
    • O(n3)O(n^3) determinant computations ($560$ for n=16n=16) for GabcG_{abc}.
    • Expected O(n3)O(n^3) for robust triangle selection without full sorting.
    • Per-recognition cost: O(n3+Tmn)O(n^3 + T \cdot m \cdot n), or 3000030\,0004000040\,000 multiply-accumulate operations.
  • Latency: Achieves sub-millisecond recognition on modern CPUs. JavaScript implementation reports 0.66\sim 0.66 ms per gesture (aggregate: $3.3$ s for 49504\,950 gestures).
  • Accuracy (on \$1 dataset, 4\,950 gestures, 15 templates):

| Recognizer | Accuracy (%) | Runtime (s, total) | |--------------------|-------------|--------------------| | $1 Recognizer | 95.56 | 2.038 | | Squiggle | 95.09 | 3.292 | | Protractor | 92.87 | 0.254 |

  • Decision Correlation:
    • Squiggle vs $1 Recognizer:: 94.85\%</li><li>SquigglevsProtractor:</li> <li>Squiggle vs Protractor: 92.77\%$

7. Practical Usage and Real-Time Feedback

Because the Squiggle-based Glyph2Cloud module operates in screen coordinates and applies affine alignment identical to rendering pipelines, it naturally supports overlaying visual template "shadows" during drawing. This enables real-time, accurate feedback for user gestures, exploiting the direct geometry-to-visual mapping. Template preprocessing (H_{abc} storage) and modular pipeline composition further simplify deployment in gesture-based UI systems (Lee, 2011).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Glyph2Cloud Module.