Labeled Polygon Sequence Representation
- Labeled Polygon Sequence Representation is a formalism that encodes structured objects as ordered sequences of vertices with semantic annotations, supporting flexible polygon counts and vertex cardinality.
- It leverages geometric quantization, embedding techniques, and autoregressive decoders to model spatial configurations in tasks like floorplan reconstruction and instance segmentation.
- The approach integrates attention guidance, anchor mechanisms, and multi-part loss functions to refine predictions and enforce consistent polygon ordering.
A labeled polygon sequence representation encodes structured objects—such as floorplan regions, detected objects, or discrete code sequences—as an ordered sequence of geometric vertices, combined with semantic or categorical labels per vertex or polygon. This formalism underlies diverse recent advances in vector-graphics extraction from images, polygonal object detection, error-correcting code visualization, and sequence-to-sequence modeling of geometric shapes. The representation supports variable numbers of polygons, flexible vertex cardinality, semantic annotations, and efficient autoregressive or set-based generation mechanisms.
1. Formal Definition and Tokenization
For tasks such as floorplan vectorization or instance segmentation, a scene is decomposed into a sequence of labeled polygons, each represented as an ordered list of vertices. For example, consider \emph{Raster2Seq} for floorplan reconstruction: each corner token carries:
- : continuous 2D coordinates of vertex (normalized to image or feature map dimensions)
- : a one-hot or probabilistic -dimensional semantic label (e.g., room type)
Tokenization involves linearizing the entire set of polygons (e.g., all rooms followed by windows and doors) into a single sequence, bracketed by symbolic tokens. The canonical structure is:
where is the number of vertices in polygon ; \texttt{<SEP>} and \texttt{<EOS>} mark polygon boundaries and sequence termination (Phung et al., 9 Feb 2026, Liu et al., 2023).
Analogous structures appear in object detection and instance segmentation methods, with coordinate tokens and separator tokens encoded consistently (Liu et al., 2023, Zheng et al., 2023).
2. Geometric Quantization, Embedding, and Representation
Polygonal Geometry
Continuous vertex coordinates are typically embedded into a high-dimensional feature space via quantization and bilinear interpolation over a learnable codebook. For Raster2Seq and PolyFormer:
- Vertices are normalized to image bounds and mapped via bilinear interpolation among nearby codebook entries:
where is a codebook and are interpolation weights determined by the point's fractional coordinates (Phung et al., 9 Feb 2026, Liu et al., 2023).
In DPPD, vertices are parameterized in polar coordinates relative to an inferred object center , with each vertex as . Angle tokens are produced via a cumulative-softmax, ensuring counter-clockwise and non-overlapping ordering (Zheng et al., 2023).
Semantic Labels
Semantic information is carried in the one-hot/probabilistic attached to each vertex or encoded separately for the polygon. At inference, semantic assignment is typically ; at training, a cross-entropy loss is applied for supervision (Phung et al., 9 Feb 2026).
Special Tokens
Sequences always begin with \texttt{<BOS>} and end with \texttt{<EOS>}. \texttt{<SEP>} tokens split polygons. Token classes (per-step) are predicted via a classification head (Phung et al., 9 Feb 2026, Liu et al., 2023).
3. Autoregressive and Set-Based Generation
State-of-the-art models employ autoregressive decoders (e.g., Transformers) to produce labeled polygon sequences, conditioned on past outputs and image features:
- At step , the decoder predicts the next vertex coordinates and label from :
- Causal masking ensures the decoder attends only to previously generated content.
- During inference, generation proceeds sequentially until an \texttt{<EOS>} token is produced (Phung et al., 9 Feb 2026, Liu et al., 2023).
DPPD departs from strict sequence modeling, instead predicting a sparse, unordered set of vertices with explicit postprocessing to enforce consistent ordering (Zheng et al., 2023).
4. Anchor Mechanisms and Attention Guidance
Attention-based decoders in labeled polygon sequence models utilize spatial anchor mechanisms to refine geometric precision and focus:
- Each token position is associated with a 2D anchor , initialized randomly and learned during training.
- Rather than directly outputting the vertex, the model predicts an offset from the anchor, so:
- In transformer-based decoders, deformable attention is used: for query , a small MLP predicts offsets around anchor , and attention is restricted to spatial positions on the image feature map. This spatial focus yields more efficient and accurate processing of complex geometry (Phung et al., 9 Feb 2026).
5. Loss Functions and Training
Labeled polygon sequence models impose multiple supervisory losses:
- Coordinate regression loss: for vertex localization.
- Semantic classification loss: for semantic consistency, with masking special/padding tokens.
- Token-type loss: , where indicates the token class (\texttt{<CORNER>}, \texttt{<SEP>}, \texttt{<EOS>}).
- The total loss is a weighted sum: (Phung et al., 9 Feb 2026).
DPPD uses analogous losses in polar space, including polar-IoU, center regression, and smoothness regularization for ordered vertex subsequences (Zheng et al., 2023).
6. Worked Examples and Application Domains
A general example (from Raster2Seq):
- Sequence: [\texttt{<BOS>}, (12.3, 45.1; bedroom), (32.7, 45.1; bedroom), (32.7,65.4; bedroom), (12.3,65.4; bedroom), \texttt{<SEP>}, (25.0,45.0; window), (35.0,45.0; window), \texttt{<EOS>}]
- Different semantic labels encode different object classes.
- At test time, the continuous geometry is reconstructed and polygons split at \texttt{<SEP>} boundaries (Phung et al., 9 Feb 2026).
Key application domains:
- Floorplan vectorization and semantic editing (Phung et al., 9 Feb 2026)
- Instance segmentation and referring image segmentation (Liu et al., 2023)
- Arbitrary-shape object detection in vision (Zheng et al., 2023)
- Codeword visualization and finite-field transform analysis: mapping to N-gon “radar charts” yields geometric signatures invariant under certain number-theoretic transforms (Oliveira et al., 2021).
7. Extensions and Generalizations
The labeled polygon sequence formalism generalizes across object detection, segmentation, vector graphics extraction, and error-correcting code analysis:
- Sequences over arbitrary alphabets and dimensions: geometric mapping for codewords over as -gons in the complex plane (Oliveira et al., 2021).
- Differentiable resampling: DPPD’s polar parameterization enables loss computation and gradient flow for polygons of variable vertex number, with fully differentiable mapping between sparse and dense representations (Zheng et al., 2023).
- Joint image–semantics modeling: Cross-modal conditioning (e.g., via language query) in PolyFormer tightly couples semantics, geometry, and input context (Liu et al., 2023).
- Applications to CAD, error-correcting coding, self-dual code symmetry analysis, and pedagogical tool development (Oliveira et al., 2021, Phung et al., 9 Feb 2026).
The labeled polygon sequence paradigm provides a unifying structured representation bridging geometry, semantics, and sequence modeling for a wide class of modern computational perception and vector graphics tasks.