Irregular Dilated Convolutions Overview

Updated 19 October 2025

Irregular dilated convolutions are convolution operators that learn adaptive kernel positions and dilation rates to encode multiscale, irregular features.
They enable efficient processing of geometric variations in data such as images, graphs, and physical systems by overcoming limitations of fixed-grid sampling.
This approach improves performance in applications like semantic segmentation, object detection, and multiscale PDE modeling while reducing computational overhead.

Irregular dilated convolutions generalize standard dilated convolutions by allowing the spatial arrangement, dilation pattern, and/or sample geometry of kernel elements to be learnable or data-adapted instead of fixed, regular, and grid-constrained. These methods enhance the flexibility of the convolution operation, enabling efficient encoding of geometric variations and scale diversity across a broad spectrum of data domains—from classical images to highly irregular graphs and multiscale physical systems. This article surveys definitions, mathematical frameworks, and implementations grounded in the technical literature, with an emphasis on connections to irregular kernels, learnable spacings, graph generalizations, and practical applications.

1. Definitions and Principle Mechanisms

Irregular dilated convolutions represent a family of convolutional operators for which the spatial support, dilation pattern, or sample distribution of kernel weights deviates from the fixed, regular grid structures characterizing canonical (atrous) convolutions. Concretely, irregularity may be instantiated through:

Data-driven or learnable kernel positions, resulting in arbitrary non-grid-based weight placements
Arbitrary, non-uniform, or hierarchical dilation rates within or across layers and channels
Adaptive, graph-induced receptive fields dictated by the topology of input data
Non-square or non-integer sample offsets, requiring differentiable interpolation
Linear, non-quadratic parameter growth relative to kernel "size" (number of samples)

This generalization subsumes a spectrum of methods including generalized convolution on graphs (Vialatte et al., 2016), irregular convolutional kernels (Ma et al., 2017), inception convolution with axis- and channel-wise dilation (Liu et al., 2020), kernels with learnable spacings (Khalfaoui-Hassani et al., 2023), deformable and linearly deformable convolutions (Zhang et al., 2023), and operator learning settings utilizing non-uniform dilation for multiscale representation (Xu et al., 16 Jul 2024).

2. Mathematical and Algorithmic Frameworks

2.1. Generalized Graph-Based Convolutions

For data supported on irregular domains, convolution can be formulated via an underlying graph $G = (V, E)$ where each node $u \in V$ represents a signal location (e.g., a non-uniform pixel or sensor position). The convolutional update is defined by a local neighborhood mapping $u \to \mathcal{V}_u$ and a weight allocation matrix $A^e$ such that

$\mathcal{C}(e) = (w \otimes A^e) \cdot e$

where $w$ are the shared weights, and $A^e$ maps the kernel configuration onto the (possibly irregular) local neighborhood. On regular grids, this construction recovers Toeplitz structure and standard convolution; on irregular domains, it mimics the moving window characteristic while respecting the input topology (Vialatte et al., 2016).

2.2. Learnable and Interpolated Sample Positions

Irregular kernels in the ICNN framework (Ma et al., 2017) are parameterized as

$K = [W, P]$

where $W = [w_1, ..., w_n]$ are weights and $P = [[p_{1x}, p_{1y}], ..., [p_{nx}, p_{ny}]]$ are learnable kernel positions. The convolution output at location $(r^o, c^o)$ is given by bilinear-interpolated input values at continuous positions:

$OUT(r^o, c^o) = \sum_{i=1}^n w_i I'_i,$

$I'_i = \sum_{x, y \in N(p_{ix}, p_{iy})} (1 - |p_{ix} - x|) (1 - |p_{iy} - y|) I(x, y)$

Weights and positions are learned jointly.

DCLS (Khalfaoui-Hassani et al., 2023) further generalizes this by associating a weight and a continuously-learned position to each kernel element within a large window, with the output at integer position $(i,j)$ computed using triangle or Gaussian interpolation:

$K_{ij} = w \cdot \mathcal{I}_{\sigma_0+\sigma_x}(p_x - i) \cdot \mathcal{I}_{\sigma_0+\sigma_y}(p_y - j)$

where $\mathcal{I}$ is an interpolation function, and normalization ensures sum-to-one contributions.

2.3. Channel- and Axis-wise Dilation

Inception convolution (Liu et al., 2020) parameterizes dilation as a tuple $(d_x^i, d_y^i)$ for each output channel $i$ , allowing channel- and axis-dependent receptive fields. The optimal dilation configuration is determined via efficient dilation optimization (EDO), which minimizes the $L_1$ error in filter response between the regular and dilated configurations. This formulation enables massive receptive field diversity without additional inference cost.

2.4. Linear Deformable and Arbitrary Shape Sampling

Linear Deformable Convolutions or Alterable Kernel Convolution (AKConv) (Zhang et al., 2023) generate kernels with arbitrary, possibly non-square initial sampling grids and apply learned offsets per location. The number of kernel parameters grows linearly with the number of samples, in contrast to standard $k \times k$ grids. The convolution operation becomes:

$Conv(P_0) = \sum w \times (P_0 + P_n)$

where $P_0$ is the reference, and $P_n$ are the (possibly irregular) sample locations.

3. Practical Applications and Impact

3.1. Semantic Segmentation, Detection, and Dense Prediction

Irregular dilated convolutions, by virtue of their adaptive receptive fields, are favorable in tasks involving geometric variability and fine structure. ICNNs equipped with learnable kernel shapes outperform baseline Deeplab models with fixed filters on segmentation datasets such as PASCAL VOC, PASCAL CONTEXT, and CITYSCAPE (Ma et al., 2017). Inception convolution improves ImageNet classification, detection on MS COCO (e.g., AP uplift from 36.4% to 38.9% in Faster R-CNN using ResNet50 as backbone), and human pose estimation performance (Liu et al., 2020). Linear deformable convolution achieves mAP gains for large objects in YOLOv5/YOLOv7/YOLOv8 detection frameworks while reducing hardware overhead (Zhang et al., 2023).

3.2. Multiscale PDE Operator Learning

DCNO (Xu et al., 16 Jul 2024) leverages dilated convolutions with irregular (hierarchical) dilation rates—such as $(1, 3, 9, 3, 1)$ —interleaved with Fourier layers. This hybrid captures both global low-frequency and local high-frequency phenomena, crucial for high-fidelity resolution of multiscale elliptic, Navier–Stokes, and Helmholtz equations. Experimental results show DCNO surpasses baselines (FNO, MWT, U-NO) in accuracy–cost tradeoff, with particularly strong results for high-frequency or oscillatory solution regimes.

3.3. Inpainting, Edge Detection, and Resolution Preservation

Dilated partial convolutions (DPConv) (Kınlı et al., 2020) address large, irregularly masked regions in inpainting applications, notably outperforming alternatives when mask area exceeds 20%. In edge detection, dilating classical filters extends spatial context without incurring extra computations, generally yielding higher F1 and precision scores across first-order gradient methods (Orhei et al., 2021).

4. Smoothing, Degridding, and Robustness

A recurring challenge in regular dilated convolutions is the emergence of gridding artifacts—spurious periodic gaps in receptive field coverage. To address this, several smoothing and degridding techniques have been developed:

Group interaction layers and separable-and-shared (SS) convolutions ensure information is mixed across otherwise disjoint subsampling groups, providing local smoothing and enabling more continuous and robust feature propagation (Wang et al., 2018, Ziegler et al., 2019).
Lateral inhibition modules enrich spatial sampling and sharpen boundary localization by subtracting distance-weighted local signals, yielding denser effective sampling (Wang et al., 2020). This can be naturally adapted to irregular convolutional patterns.
Pre-filtering via averaging or Gaussian filters, or convex combinations thereof, allows sparse (dilated or irregular) sampling locations to "see" a larger context, mitigating the sparsity-induced information loss (Ziegler et al., 2019).

Such techniques are applicable in both regular and irregular dilated convolutional architectures, and their effectiveness is demonstrated empirically in boosting segmentation, localization, and feature smoothness metrics.

5. Performance Metrics, Parameterization, and Efficiency

Irregular dilated convolution approaches typically seek to maximize model expressiveness and receptive field coverage without incurring correspondingly high computational or memory cost. Key findings include:

Linear parameter growth is achievable with AKConv (Zhang et al., 2023), ensuring scalability for broader or denser kernels compared to $k^2$ growth in standard convolutions.
Channel- and axis-adaptive dilation via inception convolution preserves computational cost at inference while boosting accuracy across classification and detection tasks (Liu et al., 2020).
Learnable spacings (DCLS) with efficient interpolation maintain model parameter counts while improving training loss and Top-1/Top-5 accuracy (Khalfaoui-Hassani et al., 2023).
Hierarchical or non-uniform dilation sequences, as in DCNO, optimize multiscale capturing of physical phenomena and balance cost vs. accuracy (Xu et al., 16 Jul 2024).

A plausible implication is that the flexible parameterization of irregular dilated convolutions can achieve a superior tradeoff between accuracy, model size, and computational resource consumption in domains where scale and domain irregularity are central.

6. Theoretical and Implementation Considerations

Differentiable parameterization (via bilinear, triangle, or Gaussian interpolation) is central for learning continuous kernel positions and ensuring gradient flow through irregular sampling (Ma et al., 2017, Khalfaoui-Hassani et al., 2023).
Allocation matrices ensure consistent weight sharing across irregular neighborhoods in graph-based domains (Vialatte et al., 2016).
Efficient search procedures (such as EDO) make large search spaces for axis- and channel-wise dilation patterns tractable and practical on large-scale datasets (Liu et al., 2020).
Implementation frameworks support plug-and-play integration of irregular convolutional modules, including PyTorch-based reference code for DCLS and AKConv (Khalfaoui-Hassani et al., 2023, Zhang et al., 2023).

Significant attention is warranted to boundary handling, interpolation range, and associated smoothing strategies to optimize training stability and generalization.

7. Future Directions and Open Questions

The literature suggests several ongoing research avenues:

Extending irregular dilated convolutions to further non-Euclidean, graph, or manifold-supported data domains
Developing more expressive yet computationally tractable parameterizations (e.g., non-rectangular, asymmetrical, or attention-weighted sample supports)
Integrating irregular convolution modules with neural architecture search or meta-learning frameworks for automatic adaptation to diverse data
Investigating the theoretical impact of irregular sampling on spectral bias, information mixing, and generalization in deep networks

Emerging applications in multiscale physical modeling, video analysis, graph learning, inpainting, and beyond are poised to benefit from the flexibility and localized adaptability conferred by irregular dilated convolution methods.

In summary, irregular dilated convolutions encompass a set of generalizations to the standard dilated convolutional paradigm, enabling flexible and data-adaptive receptive fields. By introducing learnable or data-constrained irregularity in sampling geometry, dilation patterns, and weight allocation, these approaches enhance the capacity of convolutional operators to model complex, irregular, and multiscale phenomena, with broad support across image analysis, segmentation, detection, operator learning, and other domains.