Towards General-Purpose Representation Learning of Polygonal Geometries (2209.15458v1)

Published 29 Sep 2022 in cs.CV, cs.AI, and cs.LG

Abstract: Neural network representation learning for spatial data is a common need for geographic artificial intelligence (GeoAI) problems. In recent years, many advancements have been made in representation learning for points, polylines, and networks, whereas little progress has been made for polygons, especially complex polygonal geometries. In this work, we focus on developing a general-purpose polygon encoding model, which can encode a polygonal geometry (with or without holes, single or multipolygons) into an embedding space. The result embeddings can be leveraged directly (or finetuned) for downstream tasks such as shape classification, spatial relation prediction, and so on. To achieve model generalizability guarantees, we identify a few desirable properties: loop origin invariance, trivial vertex invariance, part permutation invariance, and topology awareness. We explore two different designs for the encoder: one derives all representations in the spatial domain; the other leverages spectral domain representations. For the spatial domain approach, we propose ResNet1D, a 1D CNN-based polygon encoder, which uses circular padding to achieve loop origin invariance on simple polygons. For the spectral domain approach, we develop NUFTspec based on Non-Uniform Fourier Transformation (NUFT), which naturally satisfies all the desired properties. We conduct experiments on two tasks: 1) shape classification based on MNIST; 2) spatial relation prediction based on two new datasets - DBSR-46K and DBSR-cplx46K. Our results show that NUFTspec and ResNet1D outperform multiple existing baselines with significant margins. While ResNet1D suffers from model performance degradation after shape-invariance geometry modifications, NUFTspec is very robust to these modifications due to the nature of the NUFT.

Authors (9)

Gengchen Mai (46 papers)
Chiyu Jiang (2 papers)
Weiwei Sun (93 papers)
Rui Zhu (138 papers)
Yao Xuan (7 papers)
Ling Cai (22 papers)
Krzysztof Janowicz (30 papers)
Stefano Ermon (279 papers)
Ni Lao (31 papers)

Citations (29)

View on Semantic Scholar

Summary

This paper introduces methods for learning general-purpose vector representations (embeddings) of polygonal geometries, including simple polygons, polygons with holes, and multipolygons. The goal is to create encoders whose output embeddings can be used directly or fine-tuned for various downstream Geographic AI (GeoAI) tasks like shape classification, spatial relation prediction, building pattern classification, and geographic question answering, without relying on manually engineered shape descriptors.

The authors identify four desirable properties for such encoders to ensure generalizability:

Loop Origin Invariance: The embedding should be the same regardless of which vertex is chosen as the starting point for the polygon's boundary sequence.
Trivial Vertex Invariance: Adding or removing vertices that lie colinearly on an edge should not change the embedding.
Part Permutation Invariance: For multipolygons, the order in which the constituent polygons are processed should not affect the final embedding.
Topology Awareness: The encoder must distinguish between exterior boundaries and interior holes, capturing the geometry's topological structure (e.g., a polygon with a hole vs. two separate polygons).

Two main approaches are proposed and evaluated:

1. ResNet1D (Spatial Domain Encoder)

Concept: Treats the polygon boundary as a 1D sequence of vertex coordinates and applies a modified 1D Residual Network (ResNet).
Implementation:
- Designed primarily for simple polygons.
- Input: A sequence of vertex coordinates $(x_0, x_1, ..., x_{n-1})$ .
- Preprocessing: Uses a "KDelta point encoder" where each point's initial feature vector includes its own coordinates and the relative displacement vectors to its $2t$ neighbors (Equation 1). This incorporates local context.
- Architecture: Employs 1D Convolutional layers, Batch Normalization, ReLU, Max Pooling, and ResNet blocks. Crucially, it uses circular padding instead of zero padding in convolutional and pooling layers. This is key to achieving loop origin invariance for simple polygons. A final Global Max Pooling layer produces the embedding (Equation 2, Figure 2a).
- Handling Complex Geometries: For polygons with holes or multipolygons, the paper concatenates the vertex sequences of all exterior and interior rings. This approach loses explicit topological information and breaks the invariance properties (Loop origin, Part permutation, Topology awareness). It is also sensitive to trivial vertices.

2. NUFTspec (Spectral Domain Encoder)

Concept: Leverages the Non-Uniform Fourier Transform (NUFT) to transform the polygonal geometry into the spectral domain, then learns an embedding from these spectral features using a Multi-Layer Perceptron (MLP).
Implementation:
- Handles complex polygonal geometries (with holes, multipolygons) naturally.
- Preprocessing (Polygon-to-Simplex Mesh):
- 1. Normalization: Applies affine transformations (translation, scaling) to map the geometry into a canonical space (e.g., $[0, 2] \times [0, 2]$ ).
- 2. Auxiliary Node: Adds the origin $(0, 0)$ as an auxiliary vertex.
- 3. Triangulation: Converts each boundary edge $(A, B)$ of the polygon (both exterior and interior rings) into a 2-simplex (triangle) by connecting its vertices to the origin ( $\triangle_{ABO}$ ). This creates a 2-simplex mesh $S^{(j)}$ (Figure 3d, Figure 4).
- 4. Orientation: Preserves the orientation of boundaries (counter-clockwise for exteriors, clockwise for interiors) to compute signed area/content for each simplex, thus encoding topology.
- NUFT Calculation: Computes the NUFT of a piecewise-constant function defined over the simplex mesh (Equations 4-8). The NUFT yields a set of complex Fourier coefficients $F_S^{(j)}(x)$ corresponding to a chosen set of 2D Fourier frequencies $W = \{w_k\}$ .
- Frequency Maps: Explores two types:
- Linear Grid ( $W^{(fft)}$ ): Standard integer frequencies used in FFT (Definition 8, Figure 5a). Required if an Inverse FFT (IFFT) is used later (like in DDSL).
- Geometric Grid ( $W^{(gmf)}$ ): Non-integer frequencies forming a geometric series (Definition 9, Figure 5b). Potentially better captures information in non-uniform data and offers more flexibility as IFFT is not needed.
- Architecture: The resulting complex NUFT features $F_S^{(j)}(x)$ are converted into a real-valued vector, potentially normalized or projected using PCA, and then fed into an MLP to produce the final polygon embedding (Equation 10, Figure 2b).
- Properties: By its nature, NUFTspec inherently satisfies all four desirable properties: loop origin invariance, trivial vertex invariance, part permutation invariance, and topology awareness (Theorem 2).

Experiments and Results

The models were evaluated on two tasks using newly introduced datasets:

Shape Classification (MNIST-cplx70k):
- Dataset created by converting MNIST images into potentially complex polygons.
- ResNet1D achieved slightly higher overall accuracy, excelling at capturing local details.
- NUFTspec performed better on multipolygon samples and demonstrated robustness to shape-invariant modifications (loop origin changes, trivial vertex addition, part permutation), confirming its theoretical properties. ResNet1D's performance degraded under these modifications.
- NUFTspec using geometric frequencies ( $W^{(gmf)}$ ) outperformed linear frequencies ( $W^{(fft)}$ ) and DDSL variants, showing the advantage of frequency map flexibility. PCA projection improved NUFT-based model performance.
Spatial Relation Prediction (DBSR-46K, DBSR-cplx46K):
- Datasets created using DBpedia relations and OpenStreetMap geometries (US entities). DBSR-46K uses simple polygons, DBSR-cplx46K uses complex ones.
- NUFTspec[gmf]+MLP achieved the best performance on both datasets, significantly outperforming ResNet1D and other baselines (including DDSL variants and a deterministic GIS approach).
- NUFTspec demonstrated better handling of real-world challenges like sliver polygons and varying scales compared to deterministic methods and ResNet1D.
- ResNet1D's lower performance was attributed to its lack of topology awareness, sometimes misinterpreting proximity for containment (Figure 11, 12).

Practical Implications and Implementation Considerations

Choice of Encoder:
- Use NUFTspec if handling complex polygons (holes, multiparts), requiring invariance properties (loop origin, trivial vertices, part permutation), or needing topology awareness is crucial. It's generally more robust for tasks like spatial relation prediction.
- Use ResNet1D if dealing primarily with simple polygons and fine local shape details are paramount for the task (like some shape classification scenarios). Requires careful handling (e.g., concatenation, circular padding) and is less robust to geometric variations.
Data Preparation:
- NUFTspec requires a robust pipeline for converting polygons to oriented simplex meshes and normalization.
- ResNet1D requires standardizing vertex sequences (e.g., sampling/padding) for batching. Handling complex polygons via concatenation is a simplification that loses information.
NUFTspec Frequency Map: Using a geometric grid ( $W^{(gmf)}$ ) generally yields better results than a linear grid ( $W^{(fft)}$ ) but requires tuning the min/max frequency parameters.
Computational Cost: NUFT involves complex number math and summation over simplices. ResNet1D uses standard CNN operations. Trade-offs exist depending on polygon complexity and dataset size.
Code: The paper suggests code availability, which would provide concrete implementation details for the polygon-to-simplex conversion, NUFT computation, and model architectures.

In summary, the paper presents NUFTspec as a robust and theoretically grounded method for encoding complex polygonal geometries, satisfying key invariance and topology properties beneficial for downstream GeoAI tasks, particularly spatial relation prediction. ResNet1D offers a simpler spatial-domain alternative effective for simple polygons but lacks robustness and topology awareness. The choice between them depends on the specific requirements of the application regarding geometric complexity, invariance needs, and the importance of local versus global features.

PDF Markdown