Papers
Topics
Authors
Recent
Search
2000 character limit reached

Hieroglyphic Stroke Analyzer (HieroSA)

Updated 16 January 2026
  • Hieroglyphic Stroke Analyzer (HieroSA) is a reinforcement learning framework that extracts normalized stroke primitives from binarized glyph images without manual annotation.
  • It utilizes coordinate normalization and Group Relative Policy Optimization to accurately convert glyph structures into explicit line-segment representations.
  • The framework outperforms conventional models in coverage and validity, enhancing OCR accuracy and enabling unsupervised analysis of diverse scripts.

Hieroglyphic Stroke Analyzer (HieroSA) is a generalizable reinforcement learning-based framework for character-level structural analysis of logographic and hieroglyphic scripts. It enables Multimodal LLMs (MLLMs) to derive explicit stroke-level decompositions from character bitmaps without manual annotation or language-specific priors, representing glyphs as interpretable line segments in a normalized coordinate system. HieroSA bypasses the limitations of conventional LLM and MLLM approaches, which treat characters as text tokens or raw images without explicit modeling of their internal geometric and compositional logic. The framework has demonstrated strong performance in capturing structural and semantic properties of hieroglyphs across diverse scripts, including ancient Chinese Oracle Bone Script (OBS), Egyptian hieroglyphs, modern Chinese, and Japanese Kanji (Luo et al., 9 Jan 2026).

1. Motivation and Problem Context

Logographic and hieroglyphic writing systems encode information not only in glyph identity but also in their internal stroke arrangement, orientation, and connectivity. This structural composition often maps directly to semantic and cultural functions within a script. However, current LLMs disregard stroke geometry by operating at the text token level, and MLLMs process glyphs only as pixel grids, remaining “structurally blind” to stroke primitives. Previous stroke-based analyses typically require script-specific inventories or labor-intensive annotation, limiting generalization to lesser-documented or unknown scripts. HieroSA addresses this bottleneck by offering a method to recover geometric stroke representations directly from binarized character bitmaps, without handcrafted data or annotated traces.

2. Methodological Framework

HieroSA operates as a reinforcement-learning pipeline that converts a binarized glyph image, formatted as black strokes on a white background, into a sequence of line-segment stroke primitives in normalized coordinates. The sequence of operations includes:

  1. Binarization of the input glyph image.
  2. Overlaying a coordinate grid to facilitate stable regression of endpoint locations.
  3. Encoding the image and grid within the Qwen3-VL-4B-Instruct vision-language backbone.
  4. Autoregressive prediction of stroke primitives, each parametrized by two endpoints (ps,pe)[1,1]2(\mathbf p_s, \mathbf p_e) \in [-1,1]^2.
  5. Computation of a reward measuring spatial coverage of predicted strokes over black-pixel regions, plus a format-conformity bonus.
  6. Inference parsing of model outputs into an explicit set {(psk,pek)}k=1n\{(\mathbf p_s^k, \mathbf p_e^k)\}_{k=1}^n for downstream structural analysis.

The framework employs Group Relative Policy Optimization (GRPO) to maximize the reward, which balances coverage accuracy against conformity to the output format. Overlaying a faint coordinate grid assists the model in localizing endpoints, as confirmed by ablation.

3. Mathematical Formulation

Coordinate normalization maps pixel positions (u,v)(u, v) in an image of width WW and height HH into [1,1]2[-1,1]^2 space via x=2uW1x = 2 \frac{u}{W} - 1, y=2vH1y = 2 \frac{v}{H} - 1, ensuring that all stroke endpoints p\mathbf p are expressed in a canonical square. Glyphs are represented as sets of nn line segments S={(psk,pek)}k=1n\mathcal S = \{(\mathbf p_s^k, \mathbf p_e^k)\}_{k=1}^n, permitting approximation of curved features by short consecutive segments.

Stroke validation samples mm equidistant points along each segment, pi=ips+(m+1i)pem+1\mathbf p_i = \frac{i \mathbf p_s + (m+1-i)\mathbf p_e}{m+1}, ensuring sample-point spacing D\leq D. Strokes with any sampled points outside the black-pixel region ΩB\Omega_B are marked invalid. Coverage is estimated by computing the tangent and normal vectors at each sample, extending along normal directions until the background is reached. Extension endpoints are truncated using λdˉ\lambda\bar d, and tangentially extended by s,e\ell_s, \ell_e. Each stroke yields an approximating polygon Ck\mathcal C_k. Stroke acceptance proceeds sequentially if the stroke covers at least τ\tau of previously uncovered black pixels.

The aggregated stroke-coverage reward is rs=CfinalΩBΩB(1αNinvalid)r_s = \frac{|\mathcal C_{\rm final} \cap \Omega_B|}{|\Omega_B|} (1 - \alpha N_{\text{invalid}}), with overall reward r=rs+βrfr = r_s + \beta r_f (format reward), driving GRPO optimization.

4. Empirical Evaluation and Comparative Analysis

The HieroSA framework was trained on 12,000 images each from six Chinese and six Japanese fonts, as well as publicly available Oracle Bone Script bitmaps, all without stroke annotation. Models were trained separately for each script over two epochs (≈22 h on 8×NVIDIA A800 GPUs), utilizing hyperparameters D=0.05D=0.05, λ=1.3\lambda=1.3, α=0.1\alpha=0.1, β=0.125\beta=0.125, GRPO batch size 32, rollout 8, and learning rate 1×1061\times10^{-6}. For coordinate overlay, a grid is overlaid on glyph images.

Testing was performed on 1,000 unseen-font images per script, reporting RE (test-time reward), CO (%) (percent of black-pixel area covered), and IS (%) (fraction of invalid strokes). HieroSA was compared to GPT-5, Claude Sonnet 4, and Qwen3-VL-4B zero-shot stroke parsers.

Model RE CO (%) IS (%)
GPT-5 (ZH) 0.133 3.6 88.2
Qwen3-VL-4B (ZH) 0.032 0.5 97.9
HieroSA (ZH) 0.837 78.5 6.1
HieroSA (JA) 0.756 72.2 10.2
HieroSA (OBS) 0.446 64.6 23.1

HieroSA outperformed baseline models by over 60 percentage points in coverage and reduced invalid strokes by more than 80 percentage points. Cross-script training (e.g., ZH→JA) retained robust performance.

5. Ablation Studies and Analysis

Ablation experiments detailed the effects of key hyperparameters:

  • Invalid-stroke penalty α\alpha: α=0\alpha=0 yielded high invalid stroke rates (IS 64%, CO 42%), α=0.1\alpha=0.1 provided balanced outcomes (CO 22%, IS 45%), while α=0.5\alpha=0.5 excessively penalized (CO 2.7%, IS 75.7%).
  • Segment endpoints: optimal decomposition was achieved with two points per stroke (endpoints); three or four points degraded performance (lower RE and CO, higher IS).
  • Format reward β\beta: optimum at β=0.125\beta=0.125; insufficient β\beta led to output parsing errors, excessive β\beta diverted model attention from coverage.
  • Coordinate overlay: inclusion increased coverage by ∼10% and decreased invalid strokes by ∼5%.

6. Qualitative Observations and Representational Outcomes

HieroSA provided consistent and interpretable line-segment decompositions across scripts:

  • Chinese “日”: four orthogonal segments aligned with geometric structure.
  • Japanese Kanji “木”: vertical, horizontal, and two diagonals, reflecting semantic skeleton.
  • Oracle Bone Script: highly pictographic glyphs parsed into a succinct sequence of segments tracing main contours.

Figure 1 in (Luo et al., 9 Jan 2026) illustrates oracle bone glyph segmentation in normalized coordinate space. This suggests applicability even to highly irregular and archaic writing forms.

7. Limitations, Generalization, and Future Directions

HieroSA dispenses with script-specific priors and stroke inventories, supporting application to under-documented scripts such as Dongba or Egyptian hieroglyphs. Performance varies with glyph complexity and geometric noise; current training data and model scale bracket moderate diversity. Proposed extensions include structure-aware denoising, larger or ensemble models for improved stability, stepwise filtering/ranking during exploration, and expansion to spline-based primitives to better model curvature.

A plausible implication is that the explicit structuring of glyphs at the stroke level facilitates downstream applications: improved Optical Character Recognition (OCR) accuracy (+1 percentage point) and structure-guided retrieval of semantically related glyphs across heterogeneous scripts. This positions HieroSA as a potential core tool in graphematics and unsupervised script analysis (Luo et al., 9 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hieroglyphic Stroke Analyzer (HieroSA).