Spatially Adaptive Bilateral Grids

Updated 27 July 2025

Spatially adaptive bilateral grids are high-dimensional data structures that extend classical bilateral filtering to support localized, content-aware image and signal processing.
They employ dynamic grid construction and adaptive kernel modulation to achieve efficient, real-time filtering while preserving key image structures.
Integrating mathematical foundations with deep learning frameworks, these grids enhance applications such as denoising, neural rendering, and scene modeling.

Spatially adaptive bilateral grids are high-dimensional data structures and filtering frameworks that generalize classical bilateral filtering to enable localized, content-aware, and efficient image and signal processing. These grids discretize both spatial and non-spatial signal features (most commonly intensity or color), supporting edge-aware and context-dependent smoothing, denoising, appearance modeling, and transformation tasks. Unlike traditional bilateral grids, spatially adaptive variants allow their parameters or even their grid structure to be dynamically modulated, either algorithmically or through learned models, in order to better preserve structure, adapt to content, and achieve real-time or large-scale performance across diverse applications.

1. Mathematical Foundations and Grid Construction

The bilateral grid consists of a regular discretization of the $2$D spatial domain (image coordinates) jointly with an additional feature or "guidance" dimension (such as intensity, color, or depth). For an image $I: \Omega \rightarrow \mathbb{R}$ , each pixel at location $\mathbf{x}$ with value $I(\mathbf{x})$ is mapped to a grid cell at coordinates $(x/r, y/r, I(\mathbf{x})/r)$ , with $r$ controlling the grid subsampling and effective window size (Hashimoto et al., 2021). Processing in the bilateral grid proceeds via three canonical steps:

Projection: Scatter image values into grid cells according to quantized spatial/feature coordinates.
Filtering: Apply convolution (typically Gaussian or box) within the grid.
Slicing: Interpolate grid results back to dense image space.

Spatial adaptivity can be introduced at multiple stages: by varying the grid sampling (adaptive allocation of resolution or bin size), by dynamically modulating filter parameters per cell or region, or by learning context-sensitive operations that alter either the grid itself or the filtering within it (Gavaskar et al., 2018, Liu et al., 2019, Wang et al., 5 Jun 2025, Shin et al., 21 Jul 2025).

2. Acceleration and Complexity Reduction via Grid-Based Bilateral Filtering

Direct bilateral filtering yields $O(S)$ per-pixel complexity, prohibitive for large-scale or real-time applications. Several lines of research have established that bilateral filters can be approximated in constant or linear time via grid-based or separable decompositions, kernel expansions, and dimension lifting.

The Gaussian-polynomial approximation (GPA) method decomposes the range kernel as $g_{\sigma_{r}}(t-\tau) = \exp(-\tau^2 / (2\sigma_r^2)) \exp(-t^2 / (2\sigma_r^2)) \exp(\tau t / \sigma_r^2)$ and approximates the translation term by a Taylor expansion, transferring nonlinearity into a series of $N+1$ linear spatial filterings (Chaudhury et al., 2016). This enables $O(1)$ per-pixel implementations with explicit error bounds and tunable approximation order $N$ .
Further acceleration can be achieved by truncating kernels, expanding them in best $N$ -term (Haar for spatial, trigonometric for range) basis sets, expressing each term as a box filter, and promoting the operation to a 3D (spatial + range) volume. Summed area tables or fast convolutions over these volumes then yield a grid-like 3D box filtering scheme, enabling constant-time or linear complexity filtering independently of the spatial support (Dai et al., 2018).

These frameworks are tightly linked to bilateral grids, as the core operation is cast as convolution on a spatial-feature grid, with adaptivity introduced by best term selection or kernel parameterization.

3. Adaptive and Context-Aware Bilateral Grid Filtering

Spatially adaptive bilateral grids generalize fixed parameter bilateral filtering by dynamically adjusting filter parameters and even the grid structure:

Adaptive Range and Spatial Kernels: In adaptive bilateral filtering, the center and width of the range kernel are allowed to vary per pixel; parameters $\theta(i)$ and $\sigma(i)$ are estimated locally or based on auxiliary data (e.g., texture, total variation metrics) (Gavaskar et al., 2018). The resulting grid is "stretched" or locally reweighted to match per-pixel intensity statistics. Polynomial moment matching and integration-by-parts allow efficient O(1) algorithmic realization.
Hierarchical and Multi-Scale Adaptation: Multi-kernel filtering introduces a hierarchy of coherent image clusters through EM and spatial connectedness, forming a cluster tree. Local intensity variation ( $\delta_{t,k}$ ) and contextual modulation terms ( $\Psi_{t,k}$ ) derived from this hierarchy parameterize adaptive kernels per region, so the effective filter response is spatially and contextually modulated (Liu et al., 2019). This approach suggests constructing grids with multi-scale, context-aware adaptivity.
Edge-Directed and Directional Filtering: For depth or geometry inference tasks, directional joint bilateral filters align the spatial kernel with edge orientation, modulating filtering support along detected edge directions and integrating both spatial and intensity/depth cues (Sindhu et al., 2017).

4. Deep Learning and Hybrid Adaptive Grids

Recent advances incorporate learnable or transformer-based models to predict or parameterize spatially adaptive bilateral grids, with notable applications in high-fidelity image enhancement and neural rendering:

Learning Bilateral Grids of Parameters: In the BPAM framework, the bilateral grid stores pixel-wise MLP parameters, enabling construction of per-pixel, locally adapted, lightweight MLPs for color mapping. Slicing operations retrieve relevant weights for each pixel, permitting highly nonlinear, spatially adaptive transformations, which outperform affine-grid and global-MLP approaches (Lou et al., 16 Jul 2025).
Transformers for Multi-View Harmonization: A feed-forward transformer predicts spatially adaptive bilateral grids to correct inter-view photometric inconsistencies in multi-view tasks. Cross-view and intra-view attention mechanisms fuse context and spatial detail, with the transformer head outputting a low-resolution grid of affine parameters per image, later upsampled and applied via edge-aware slicing. Confidence (uncertainty) grids modulate supervision and downweight hard-to-correct regions (Shin et al., 21 Jul 2025).

Both approaches demonstrate that bilateral grids, endowed with spatially adaptive, learnable parameterization, can model complex, non-linear, and locally varying photometric transformations with high efficiency.

5. Hardware Implementations and Real-Time Processing

Spatially adaptive bilateral grids have been instantiated in hardware, especially for resource-constrained or real-time environments:

The FPGA-based design "lifts" the image into a $3$D grid, with projection governed by a variable window parameter $r$ : $fv(i) = (i/r, f(i)/r)$ . Unlike conventional bilateral filtering, where window size correlates with hardware resources, this approach maintains a fixed $3\times3\times3$ grid neighborhood, and adapts the effective window size by scaling inputs. This suppresses increases in memory and computation even as $r$ grows, owing to a fully pipelined, BRAM-partitioned architecture (Hashimoto et al., 2021). Trilinear interpolation reconstructs the output from the blurred grid. This realization enables constant-throughput, scalable, high-resolution processing, and demonstrates the practical benefits of formulating spatially adaptive bilateral filtering as fixed-window grid processing.

6. Extensions in Neural Rendering and Scene Modeling

Spatially adaptive bilateral grids have been leveraged to mitigate photometric inconsistencies and enhance geometric reconstruction in neural rendering:

Multi-Scale Bilateral Grids for Appearance Modeling: In the context of neural rendering for autonomous driving, multi-scale bilateral grids are introduced to unify global (appearance code) and local (pixel-wise) appearance modeling (Wang et al., 5 Jun 2025). A hierarchy of grids, each covering progressively finer spatial and guidance (intensity) resolutions, enables composition of global and local affine transformations. Grid slicing operations across levels refine the rendered output, improving geometric accuracy and reducing artifacts (e.g., "floaters") in dynamic scenes.
3DGS Pipeline Integration: Spatially adaptive bilateral grids are directly integrated as pre-processing or correction modules within pipelines such as 3D Gaussian Splatting. Learned grids predict affine transformations per pixel to harmonize appearance, while confidence maps weight regions by reliability. This decouples photometric correction from scene optimization, accelerating convergence and improving view synthesis fidelity relative to baseline and scene-specific optimization methods (Shin et al., 21 Jul 2025).

These developments illustrate not only the modeling flexibility offered by spatially adaptive bilateral grids but also their importance in enabling cross-view consistency, large-scale scene modeling, and robust operation in the presence of sensor and environmental variation.

7. Theoretical and Practical Significance

Spatially adaptive bilateral grids unify several distinct lines of research: kernel approximation, data-dependent parameterization, grid-based signal representation, and deep learning-driven feature modeling. Their theoretical core lies in efficiently trading between local adaptivity, computational efficiency, and approximation fidelity. Adaptation—whether algorithmic (kernel selection, local statistics), hierarchical (cluster trees), or learned (transformers, MLP parameter prediction)—enables advanced image and geometry processing capabilities surpassing fixed-parameter or fixed-structure baselines.

Applications span denoising, depth completion, HDR imaging, appearance harmonization, neural rendering, and real-time enhancement on both software and hardware platforms. Research continues into improving their expressiveness (nonlinear modeling, grid decomposition), their training and inference efficiency, and their applicability to non-image domains.

In summary, spatially adaptive bilateral grids provide a mathematically principled, algorithmically efficient, and practically adaptable framework for edge-aware, locally varying, and high-dimensional signal filtering—representing a crucial component in contemporary vision, imaging, and rendering systems.