Differentiable Probabilistic Rasterization

Updated 2 December 2025

The paper presents LucidRaster, a GPU-based rasterizer that achieves exact order-independent transparency using a novel two-stage sorting pipeline and per-pixel depth-filtering.
The methodology leverages succinct data structures, such as k²-trees and k³-trees, to support efficient point, range, and interval queries on both static and time-evolving raster fields.
Empirical results show high accuracy with minimal blending artifacts and competitive performance compared to traditional hardware blending, despite its substantial memory footprint.

Differentiable probabilistic rasterization refers to methods and data structures enabling efficient, queryable, and often exact processing of raster data, particularly supporting operations required for transparency, probabilistic accumulation, and differentiable computation in raster graphics and spatial data analysis. This domain encompasses GPU software rasterizers for pixel-accurate order-independent transparency (OIT), as well as succinct memory representations for general or time-evolving rasters that support probabilistic or range/value queries with high performance.

1. Principled Frameworks for Differentiable Probabilistic Rasterization

LucidRaster exemplifies a GPU-based software rasterizer designed for exact OIT, addressing the long-standing issue of correct and glitch-free transparency rendering in complex scenes. The pipeline is characterized by a two-stage sorting mechanism: (A) a coarse, block-level in-shared-memory bitonic sort, organizing fragments by 32-bit keys $(K = (D_1\ll10 + I))$ where $D_1$ is a 22-bit quantized depth and $I$ a 10-bit triangle index; (B) a fine-grained, per-pixel min-heap “depth-filter” of size $F$ (default $F=3$ ), which maintains only the $F$ nearest unblended fragments for each pixel and immediately blends any evicted fragment in accurate front-to-back order (Jakubowski, 22 May 2024).

This sorting pipeline permits exact probabilistic color and alpha composition for transparency—respecting the per-sample blend equation: $\mathbf{C}_{\text{out}} = \sum_{i=1}^n \mathbf{C}_i\,\alpha_i \prod_{j<i}(1-\alpha_j)$ implemented as a per-pixel front-to-back accumulation in hardware-like style.

Within spatial data management, compact probabilistic representations leverage succinct tree structures, such as $k^2$ -trees and $k^3$ -trees, enabling point, range, and aggregate queries regarding observed or expected values across a raster field. Range and value queries map naturally to probabilistic interpretations in GIS applications, supporting operations such as likelihood-of-occurrence estimation over an interval or region (Brisaboa et al., 2019).

2. Algorithmic Structures, Sorting, and Accumulation

LucidRaster’s rasterization proceeds in three principal computation stages: setup, binning, and rasterization. During rasterization, for each 8×4-pixel half-block, a fixed-size list of “tri-half-blocks” records all samples of each intersecting triangle. A 32-bit primary key for sorting, computed as $K = (D_1\ll10 + I)$ , ensures block-level fragment ordering in shared memory via bitonic sort. This local ordering is refined per-pixel by maintaining a min-heap (the “depth-filter”) of depth and color/alpha pairs of size $F$ .

For each incoming sample, the depth-filter admits the deepest $F$ samples. If overflow occurs, the furthest is immediately blended into the buffer via

$\mathbf{C}_\mathrm{acc} \leftarrow \mathbf{C}_\mathrm{acc} + (1-A_\mathrm{acc})\cdot(\alpha_k \mathbf{C}_k)$

$A_\mathrm{acc} \leftarrow A_\mathrm{acc} + (1-A_\mathrm{acc})\cdot\alpha_k$

After all samples, any remaining entries in the heap are blended out in increasing depth order.

Optimization includes an alpha-threshold early-out: when $A_\mathrm{acc} \ge 1-\epsilon$ , further shading is skipped for the half-block, with $\epsilon = 1/128$ . This mechanism directly exploits probabilistic occlusion for efficiency (Jakubowski, 22 May 2024).

In succinct raster representations, ordering and accumulation are governed by bitmaps $T$ , $T'$ , and $L$ in $k^2$ -tree variations, with rank/select primitives providing $O(1)$ navigation and $O(\log n)$ cell access, supporting efficient probabilistic, value, and span queries on general and time-evolving rasters (Brisaboa et al., 2019).

3. Data Structures and Memory Utilization

LucidRaster relies on a per-triangle structure ( $\sim$ 80 bytes/triangle) storing normals, edge functions, depth edge/flags, and AABBs, and per-quad structures (65 bytes/quad) for colors, texcoords, and bin metadata. Scratch buffers for raster subregions require $\sim$ 768 KB/workgroup. Total GPU memory for tested scenes reaches 1.2 GB, with memory cost modeled by: $M \approx N_{\mathrm{triangles}}\,80\,B + N_{\mathrm{quads}}\,65\,B + N_{\mathrm{bins}}\,(\text{offsets}+\text{counts}) + S_{\text{scratch}}$ Data-oriented attribute grouping, two-phase batching with prefix sums, and subgroup shuffles minimize cache and memory bandwidth usage (Jakubowski, 22 May 2024).

In compact raster representations, structures such as $k^2$ -trees require

$S_{\mathrm{bin}} = |T| + |T'| + |L| = \sum_{i=0}^{h-1} N_i + N_h + o(\sum N_i + N_h)$

bits, optimally $O(n^2/(k^2-1))$ in the dense case. For general rasters with $\sigma$ values, multi-tree and interleaved strategies provide space between $1.83$–$2.53$ bits/cell ( $k^3$ -tree to interleaved), competitive with compressed GeoTIFF for moderate alphabets (Brisaboa et al., 2019).

4. Query and Compositional Methods

Query patterns in differentiable probabilistic rasterization include point-sampling, spatial range selection, value-based retrieval, and temporal queries over time-evolving rasters. In LucidRaster, sampled fragment data is composited using the probabilistic blend equation above, with exact order maintained by the two-stage sorting. The front-to-back accumulation strategy aligns with hardware standards.

Succinct raster representations offer $O(\log n)$ cell queries, and $O(\mathrm{occ}+\log n)$ range and value window queries using rank/select. For general rasters, the cumulative $k^2$ (CM) and $k^3$ -tree enable efficient interval and set queries vital for probabilistic modeling of raster fields over space, value, and time. For time-evolving rasters, interleaved structures ( $I$ ) and $k^3$ -trees enable $O(\log n+\log T)$ access and $O(\mathrm{occ}+\log n+\log T)$ interval queries (Brisaboa et al., 2019).

5. Performance Analysis

Empirical performance of LucidRaster indicates a mean slowdown of $\sim$ 3.3 $\times$ compared to hardware alpha blending, competitive with or better than other high-quality OIT approximations, especially in triangle-dense or depth-complex scenes. Depth-filter size $F$ allows for explicit accuracy-speed tradeoffs: $F=3$ yields $<0.2\%$ invalid pixels, increasing $F$ to $8$ reduces error to $0.02\%$ with $2\%$ overhead. Alpha-threshold can accelerate opaque scenes by up to $30\%$ , being scene-dependent.

For compact in-memory raster representations, the $k^3$ -tree attains $1.6$– $2.9~\mu$ s point queries in general rasters, and time-evolving representations ( $I$ and $k^3$ ) consistently provide interval queries in $2$– $9~\mu$ s. These structures are up to $10\times$ smaller than traditional linear quadtrees in the binary case and answer range/select queries orders of magnitude faster than compressed GeoTIFF (Brisaboa et al., 2019).

Summary of performance (as measured in (Jakubowski, 22 May 2024, Brisaboa et al., 2019)):

Method	Space (bits/cell)	Point Query ( $\mu$ s)	Range Query
LucidRaster	--	--	Front-to-back $O(1)$ per pixel
$k^3$ -tree (general)	1.83	1.6–2.9	2–9 $\mu$ s
$k^2$ -ones (binary)	0.03–0.04	0.29–0.41	3.4 $\mu$ s
QT-static (binary)	0.25	0.88–1.10	0.9 $\mu$ s
GeoTIFF-comp (general)	1.12	460–500	>500 $\mu$ s

6. Limitations and Prospective Extensions

LucidRaster’s memory footprint is substantial ( $\sim$ 1.2 GB); dynamic memory management could mitigate unused capacity. The fixed-bin size (32×32 pixels) could be adapted for scene density. No MSAA is currently implemented; enabling 4×–16× MSAA would increase interval storage but is feasible in the blend phase. The current default depth-filter ( $F=3$ ) can be increased to further reduce blending artifacts, but at additional cost. Concurrent execution of low- and high-raster paths may improve utilization for irregular scenes (Jakubowski, 22 May 2024).

For succinct raster representations, scalability for very high-dimensional ( $>\!3$ D) data, or rapidly changing time-evolving environments, may challenge the practical benefits of $k^3$ -tree or interleaved strategies. Increasing value or time domain size correlates directly with storage and navigation complexity. The optimal structure in practice depends crucially on the sparsity and change regime of the underlying raster data (Brisaboa et al., 2019).

7. Broader Context and Applications

Differentiable probabilistic rasterization forms a core computational substrate for real-time graphics, scientific visualization, and spatial data analysis, with direct relevance to order-independent transparency, light transport, and probabilistic spatial modeling in GIS. The LucidRaster pipeline demonstrates efficient, exact OIT for high-complexity scenes, while succinct raster representations with $k^2$ -tree and $k^3$ -tree enable high-speed, space-efficient in-memory analytics for raster fields over space, value, and time.

The methods surveyed enable:

Exact, real-time compositing of transparent surfaces in GPU graphics pipelines (Jakubowski, 22 May 2024).
Sub-microsecond point and range queries in high-dimensional raster fields, enabling probabilistic and top- $k$ analytics in GIS and imaging pipelines (Brisaboa et al., 2019).

A plausible implication is that advances in GPU-software rasterization and succinct data structures may converge, enabling future rasterization and spatial analytics engines supporting both differentiability (for machine learning integration), probabilistic operations, and expressive query semantics at scale.