Papers
Topics
Authors
Recent
Search
2000 character limit reached

Labeled Polygon Sequence Representation

Updated 26 March 2026
  • Labeled Polygon Sequence Representation is a formalism that encodes structured objects as ordered sequences of vertices with semantic annotations, supporting flexible polygon counts and vertex cardinality.
  • It leverages geometric quantization, embedding techniques, and autoregressive decoders to model spatial configurations in tasks like floorplan reconstruction and instance segmentation.
  • The approach integrates attention guidance, anchor mechanisms, and multi-part loss functions to refine predictions and enforce consistent polygon ordering.

A labeled polygon sequence representation encodes structured objects—such as floorplan regions, detected objects, or discrete code sequences—as an ordered sequence of geometric vertices, combined with semantic or categorical labels per vertex or polygon. This formalism underlies diverse recent advances in vector-graphics extraction from images, polygonal object detection, error-correcting code visualization, and sequence-to-sequence modeling of geometric shapes. The representation supports variable numbers of polygons, flexible vertex cardinality, semantic annotations, and efficient autoregressive or set-based generation mechanisms.

1. Formal Definition and Tokenization

For tasks such as floorplan vectorization or instance segmentation, a scene is decomposed into a sequence of labeled polygons, each represented as an ordered list of vertices. For example, consider \emph{Raster2Seq} for floorplan reconstruction: each corner token ci=(xi,yi,pi)c_i = (x_i, y_i, p_i) carries:

  • (xi,yi)(x_i, y_i): continuous 2D coordinates of vertex ii (normalized to image or feature map dimensions)
  • piΔCp_i \in \Delta^C: a one-hot or probabilistic CC-dimensional semantic label (e.g., room type)

Tokenization involves linearizing the entire set of polygons (e.g., all rooms followed by windows and doors) into a single sequence, bracketed by symbolic tokens. The canonical structure is:

S=[<BOS>,c11,,cL11,<SEP>,,cLNN,<SEP>,,<EOS>]S = [\texttt{<BOS>}, c^1_1, \ldots, c^1_{L_1}, \texttt{<SEP>}, \ldots, c^N_{L_N}, \texttt{<SEP>}, \ldots, \texttt{<EOS>}]

where LkL_k is the number of vertices in polygon kk; \texttt{<SEP>} and \texttt{<EOS>} mark polygon boundaries and sequence termination (Phung et al., 9 Feb 2026, Liu et al., 2023).

Analogous structures appear in object detection and instance segmentation methods, with coordinate tokens and separator tokens encoded consistently (Liu et al., 2023, Zheng et al., 2023).

2. Geometric Quantization, Embedding, and Representation

Polygonal Geometry

Continuous (x,y)(x,y) vertex coordinates are typically embedded into a high-dimensional feature space via quantization and bilinear interpolation over a learnable codebook. For Raster2Seq and PolyFormer:

  • Vertices are normalized to image bounds and mapped via bilinear interpolation among nearby codebook entries:

ex,y=i,jwi,jC[i,j]e_{x,y} = \sum_{i,j} w_{i,j} C[i,j]

where CRHb×Wb×DC \in \mathbb{R}^{H_b \times W_b \times D} is a codebook and wi,jw_{i,j} are interpolation weights determined by the point's fractional coordinates (Phung et al., 9 Feb 2026, Liu et al., 2023).

In DPPD, vertices are parameterized in polar coordinates relative to an inferred object center o=(ox,oy)o = (o_x, o_y), with each vertex as (ri,ai)(r_i, a_i). Angle tokens are produced via a cumulative-softmax, ensuring counter-clockwise and non-overlapping ordering (Zheng et al., 2023).

Semantic Labels

Semantic information is carried in the one-hot/probabilistic pip_i attached to each vertex or encoded separately for the polygon. At inference, semantic assignment is typically argmax(pi)\operatorname{argmax}(p_i); at training, a cross-entropy loss is applied for supervision (Phung et al., 9 Feb 2026).

Special Tokens

Sequences always begin with \texttt{<BOS>} and end with \texttt{<EOS>}. \texttt{<SEP>} tokens split polygons. Token classes (per-step) are predicted via a classification head (Phung et al., 9 Feb 2026, Liu et al., 2023).

3. Autoregressive and Set-Based Generation

State-of-the-art models employ autoregressive decoders (e.g., Transformers) to produce labeled polygon sequences, conditioned on past outputs and image features:

  • At step tt, the decoder predicts the next vertex coordinates vt=(xt,yt)v_t = (x_t, y_t) and label ltl_t from (v<t,l<t,I)(v_{<t}, l_{<t}, I):

p(vt,ltv<t,l<t,I)p(v_t, l_t \mid v_{<t}, l_{<t}, I)

DPPD departs from strict sequence modeling, instead predicting a sparse, unordered set of vertices with explicit postprocessing to enforce consistent ordering (Zheng et al., 2023).

4. Anchor Mechanisms and Attention Guidance

Attention-based decoders in labeled polygon sequence models utilize spatial anchor mechanisms to refine geometric precision and focus:

  • Each token position tt is associated with a 2D anchor ata_t, initialized randomly and learned during training.
  • Rather than directly outputting the vertex, the model predicts an offset Δt\Delta_t from the anchor, so:

(x^t,y^t)=Sigmoid(at)+Δt(\hat{x}_t, \hat{y}_t) = \operatorname{Sigmoid}(a_t) + \Delta_t

  • In transformer-based decoders, deformable attention is used: for query qtq_t, a small MLP predicts KK offsets {δt,k}\{\delta_{t,k}\} around anchor ata_t, and attention is restricted to KK spatial positions on the image feature map. This spatial focus yields more efficient and accurate processing of complex geometry (Phung et al., 9 Feb 2026).

5. Loss Functions and Training

Labeled polygon sequence models impose multiple supervisory losses:

  • Coordinate regression loss: Lcoord=1Ltmt(x^t,y^t)(xt,yt)1L_{\text{coord}} = \frac{1}{L}\sum_t m_t \|(\hat{x}_t, \hat{y}_t) - (x_t, y_t)\|_1 for vertex localization.
  • Semantic classification loss: Lsem=1Lt=1LmtCE(p^t,pt)L_{\text{sem}} = \frac{1}{L}\sum_{t=1}^L m_t \cdot \mathrm{CE}(\hat{p}_t, p_t) for semantic consistency, with mtm_t masking special/padding tokens.
  • Token-type loss: Ltoken=1LtmtCE(q^t,qt)L_{\text{token}} = \frac{1}{L}\sum_t m_t \cdot \mathrm{CE}(\hat{q}_t, q_t), where qtq_t indicates the token class (\texttt{<CORNER>}, \texttt{<SEP>}, \texttt{<EOS>}).
  • The total loss is a weighted sum: L=λcoordLcoord+λtokenLtoken+λsemLsemL = \lambda_{\text{coord}} L_{\text{coord}} + \lambda_{\text{token}} L_{\text{token}} + \lambda_{\text{sem}} L_{\text{sem}} (Phung et al., 9 Feb 2026).

DPPD uses analogous losses in polar space, including polar-IoU, center regression, and smoothness regularization for ordered vertex subsequences (Zheng et al., 2023).

6. Worked Examples and Application Domains

A general example (from Raster2Seq):

  • Sequence: [\texttt{<BOS>}, (12.3, 45.1; p=p=bedroom), (32.7, 45.1; p=p=bedroom), (32.7,65.4; p=p=bedroom), (12.3,65.4; p=p=bedroom), \texttt{<SEP>}, (25.0,45.0; p=p=window), (35.0,45.0; p=p=window), \texttt{<EOS>}]
  • Different semantic labels encode different object classes.
  • At test time, the continuous geometry is reconstructed and polygons split at \texttt{<SEP>} boundaries (Phung et al., 9 Feb 2026).

Key application domains:

  • Floorplan vectorization and semantic editing (Phung et al., 9 Feb 2026)
  • Instance segmentation and referring image segmentation (Liu et al., 2023)
  • Arbitrary-shape object detection in vision (Zheng et al., 2023)
  • Codeword visualization and finite-field transform analysis: mapping xGF(p)Nx \in \mathrm{GF}(p)^N to N-gon “radar charts” yields geometric signatures invariant under certain number-theoretic transforms (Oliveira et al., 2021).

7. Extensions and Generalizations

The labeled polygon sequence formalism generalizes across object detection, segmentation, vector graphics extraction, and error-correcting code analysis:

  • Sequences over arbitrary alphabets and dimensions: geometric mapping for codewords over GF(p)\mathrm{GF}(p) as NN-gons in the complex plane (Oliveira et al., 2021).
  • Differentiable resampling: DPPD’s polar parameterization enables loss computation and gradient flow for polygons of variable vertex number, with fully differentiable mapping between sparse and dense representations (Zheng et al., 2023).
  • Joint image–semantics modeling: Cross-modal conditioning (e.g., via language query) in PolyFormer tightly couples semantics, geometry, and input context (Liu et al., 2023).
  • Applications to CAD, error-correcting coding, self-dual code symmetry analysis, and pedagogical tool development (Oliveira et al., 2021, Phung et al., 9 Feb 2026).

The labeled polygon sequence paradigm provides a unifying structured representation bridging geometry, semantics, and sequence modeling for a wide class of modern computational perception and vector graphics tasks.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Labeled Polygon Sequence Representation.