Labeled Polygon Sequence Representation

Updated 26 March 2026

Labeled Polygon Sequence Representation is a formalism that encodes structured objects as ordered sequences of vertices with semantic annotations, supporting flexible polygon counts and vertex cardinality.
It leverages geometric quantization, embedding techniques, and autoregressive decoders to model spatial configurations in tasks like floorplan reconstruction and instance segmentation.
The approach integrates attention guidance, anchor mechanisms, and multi-part loss functions to refine predictions and enforce consistent polygon ordering.

A labeled polygon sequence representation encodes structured objects—such as floorplan regions, detected objects, or discrete code sequences—as an ordered sequence of geometric vertices, combined with semantic or categorical labels per vertex or polygon. This formalism underlies diverse recent advances in vector-graphics extraction from images, polygonal object detection, error-correcting code visualization, and sequence-to-sequence modeling of geometric shapes. The representation supports variable numbers of polygons, flexible vertex cardinality, semantic annotations, and efficient autoregressive or set-based generation mechanisms.

1. Formal Definition and Tokenization

For tasks such as floorplan vectorization or instance segmentation, a scene is decomposed into a sequence of labeled polygons, each represented as an ordered list of vertices. For example, consider \emph{Raster2Seq} for floorplan reconstruction: each corner token $c_i = (x_i, y_i, p_i)$ carries:

$(x_i, y_i)$ : continuous 2D coordinates of vertex $i$ (normalized to image or feature map dimensions)
$p_i \in \Delta^C$ : a one-hot or probabilistic $C$ -dimensional semantic label (e.g., room type)

Tokenization involves linearizing the entire set of polygons (e.g., all rooms followed by windows and doors) into a single sequence, bracketed by symbolic tokens. The canonical structure is:

$S = [\texttt{<BOS>}, c^1_1, \ldots, c^1_{L_1}, \texttt{<SEP>}, \ldots, c^N_{L_N}, \texttt{<SEP>}, \ldots, \texttt{<EOS>}]$

where $L_k$ is the number of vertices in polygon $k$ ; \texttt{<SEP>} and \texttt{<EOS>} mark polygon boundaries and sequence termination (Phung et al., 9 Feb 2026, Liu et al., 2023).

Analogous structures appear in object detection and instance segmentation methods, with coordinate tokens and separator tokens encoded consistently (Liu et al., 2023, Zheng et al., 2023).

2. Geometric Quantization, Embedding, and Representation

Polygonal Geometry

Continuous $(x,y)$ vertex coordinates are typically embedded into a high-dimensional feature space via quantization and bilinear interpolation over a learnable codebook. For Raster2Seq and PolyFormer:

Vertices are normalized to image bounds and mapped via bilinear interpolation among nearby codebook entries:

$e_{x,y} = \sum_{i,j} w_{i,j} C[i,j]$

where $C \in \mathbb{R}^{H_b \times W_b \times D}$ is a codebook and $w_{i,j}$ are interpolation weights determined by the point's fractional coordinates (Phung et al., 9 Feb 2026, Liu et al., 2023).

In DPPD, vertices are parameterized in polar coordinates relative to an inferred object center $o = (o_x, o_y)$ , with each vertex as $(r_i, a_i)$ . Angle tokens are produced via a cumulative-softmax, ensuring counter-clockwise and non-overlapping ordering (Zheng et al., 2023).

Semantic Labels

Semantic information is carried in the one-hot/probabilistic $p_i$ attached to each vertex or encoded separately for the polygon. At inference, semantic assignment is typically $\operatorname{argmax}(p_i)$ ; at training, a cross-entropy loss is applied for supervision (Phung et al., 9 Feb 2026).

Special Tokens

Sequences always begin with \texttt{<BOS>} and end with \texttt{<EOS>}. \texttt{<SEP>} tokens split polygons. Token classes (per-step) are predicted via a classification head (Phung et al., 9 Feb 2026, Liu et al., 2023).

3. Autoregressive and Set-Based Generation

State-of-the-art models employ autoregressive decoders (e.g., Transformers) to produce labeled polygon sequences, conditioned on past outputs and image features:

At step $t$ , the decoder predicts the next vertex coordinates $v_t = (x_t, y_t)$ and label $l_t$ from $(v_{<t}, l_{<t}, I)$ :

$p(v_t, l_t \mid v_{<t}, l_{<t}, I)$

Causal masking ensures the decoder attends only to previously generated content.
During inference, generation proceeds sequentially until an \texttt{<EOS>} token is produced (Phung et al., 9 Feb 2026, Liu et al., 2023).

DPPD departs from strict sequence modeling, instead predicting a sparse, unordered set of vertices with explicit postprocessing to enforce consistent ordering (Zheng et al., 2023).

4. Anchor Mechanisms and Attention Guidance

Attention-based decoders in labeled polygon sequence models utilize spatial anchor mechanisms to refine geometric precision and focus:

Each token position $t$ is associated with a 2D anchor $a_t$ , initialized randomly and learned during training.
Rather than directly outputting the vertex, the model predicts an offset $\Delta_t$ from the anchor, so:

$(\hat{x}_t, \hat{y}_t) = \operatorname{Sigmoid}(a_t) + \Delta_t$

In transformer-based decoders, deformable attention is used: for query $q_t$ , a small MLP predicts $K$ offsets $\{\delta_{t,k}\}$ around anchor $a_t$ , and attention is restricted to $K$ spatial positions on the image feature map. This spatial focus yields more efficient and accurate processing of complex geometry (Phung et al., 9 Feb 2026).

5. Loss Functions and Training

Labeled polygon sequence models impose multiple supervisory losses:

Coordinate regression loss: $L_{\text{coord}} = \frac{1}{L}\sum_t m_t \|(\hat{x}_t, \hat{y}_t) - (x_t, y_t)\|_1$ for vertex localization.
Semantic classification loss: $L_{\text{sem}} = \frac{1}{L}\sum_{t=1}^L m_t \cdot \mathrm{CE}(\hat{p}_t, p_t)$ for semantic consistency, with $m_t$ masking special/padding tokens.
Token-type loss: $L_{\text{token}} = \frac{1}{L}\sum_t m_t \cdot \mathrm{CE}(\hat{q}_t, q_t)$ , where $q_t$ indicates the token class (\texttt{<CORNER>}, \texttt{<SEP>}, \texttt{<EOS>}).
The total loss is a weighted sum: $L = \lambda_{\text{coord}} L_{\text{coord}} + \lambda_{\text{token}} L_{\text{token}} + \lambda_{\text{sem}} L_{\text{sem}}$ (Phung et al., 9 Feb 2026).

DPPD uses analogous losses in polar space, including polar-IoU, center regression, and smoothness regularization for ordered vertex subsequences (Zheng et al., 2023).

6. Worked Examples and Application Domains

A general example (from Raster2Seq):

Sequence: [\texttt{<BOS>}, (12.3, 45.1; $p=$ bedroom), (32.7, 45.1; $p=$ bedroom), (32.7,65.4; $p=$ bedroom), (12.3,65.4; $p=$ bedroom), \texttt{<SEP>}, (25.0,45.0; $p=$ window), (35.0,45.0; $p=$ window), \texttt{<EOS>}]
Different semantic labels encode different object classes.
At test time, the continuous geometry is reconstructed and polygons split at \texttt{<SEP>} boundaries (Phung et al., 9 Feb 2026).

Key application domains:

Floorplan vectorization and semantic editing (Phung et al., 9 Feb 2026)
Instance segmentation and referring image segmentation (Liu et al., 2023)
Arbitrary-shape object detection in vision (Zheng et al., 2023)
Codeword visualization and finite-field transform analysis: mapping $x \in \mathrm{GF}(p)^N$ to N-gon “radar charts” yields geometric signatures invariant under certain number-theoretic transforms (Oliveira et al., 2021).

7. Extensions and Generalizations

The labeled polygon sequence formalism generalizes across object detection, segmentation, vector graphics extraction, and error-correcting code analysis:

Sequences over arbitrary alphabets and dimensions: geometric mapping for codewords over $\mathrm{GF}(p)$ as $N$ -gons in the complex plane (Oliveira et al., 2021).
Differentiable resampling: DPPD’s polar parameterization enables loss computation and gradient flow for polygons of variable vertex number, with fully differentiable mapping between sparse and dense representations (Zheng et al., 2023).
Joint image–semantics modeling: Cross-modal conditioning (e.g., via language query) in PolyFormer tightly couples semantics, geometry, and input context (Liu et al., 2023).
Applications to CAD, error-correcting coding, self-dual code symmetry analysis, and pedagogical tool development (Oliveira et al., 2021, Phung et al., 9 Feb 2026).

The labeled polygon sequence paradigm provides a unifying structured representation bridging geometry, semantics, and sequence modeling for a wide class of modern computational perception and vector graphics tasks.

Markdown Report Issue Upgrade to Chat

References (4)

Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction (2026)

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation (2023)

DPPD: Deformable Polar Polygon Object Detection (2023)

Geometrical Representation for Number-theoretic Transforms (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Labeled Polygon Sequence Representation.

Labeled Polygon Sequence Representation

1. Formal Definition and Tokenization

2. Geometric Quantization, Embedding, and Representation

Polygonal Geometry

Semantic Labels

Special Tokens

3. Autoregressive and Set-Based Generation

4. Anchor Mechanisms and Attention Guidance

5. Loss Functions and Training

6. Worked Examples and Application Domains

7. Extensions and Generalizations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Labeled Polygon Sequence Representation

1. Formal Definition and Tokenization

2. Geometric Quantization, Embedding, and Representation

Polygonal Geometry

Semantic Labels

Special Tokens

3. Autoregressive and Set-Based Generation

4. Anchor Mechanisms and Attention Guidance

5. Loss Functions and Training

6. Worked Examples and Application Domains

7. Extensions and Generalizations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research