Long-Context Modeling via Morton Serialization

Updated 30 March 2026

The paper demonstrates that Morton serialization preserves spatial locality by interleaving bits of multidimensional coordinates, enabling effective long-context modeling.
It leverages state-space and autoregressive models to achieve better performance in image restoration and 3D compression, with quantifiable improvements in PSNR and BD-Rate.
The approach supports efficient processing of high-dimensional data through reduced context path lengths and enhanced memory locality in both 2D and 3D applications.

Long-context modeling via Morton serialization is an approach for neural sequence modeling over spatial domains—specifically images, volumes, or point clouds—wherein the native high-dimensional spatial grid is mapped to a one-dimensional sequence using Morton (Z-order) codes. This serialization preserves spatial locality while facilitating computationally efficient recurrent or autoregressive processing. Morton serialization has been leveraged for long-range dependency modeling in both vision restoration and 3D compression pipelines, providing advantages in state-space modeling, large-context aggregation, and entropy coding. Recent work demonstrates that Morton-order scans outperform raster scans with respect to both local neighborhood grouping and the feasibility of efficient large-scale context windows for high-dimensional data (Wang et al., 23 May 2025, Liu et al., 30 Nov 2025).

1. Principles of Morton Serialization

Morton serialization, or Z-order curve mapping, encodes multidimensional spatial coordinates into a single scalar index by interleaving the bits of each coordinate. For 2D coordinates $(i,j)$ , the Morton index $t$ is computed by alternating the bits of $j$ and $i$ . For 3D coordinates $(x,y,z)$ , the index $\pi$ interleaves bits from all three axes:

$\pi = \sum_{k=0}^{d-1} [ (bit(x,k) \ll (3k)) + (bit(y,k) \ll (3k+1)) + (bit(z,k) \ll (3k+2)) ]$

where $d$ is the bitwidth of the quantized spatial axis. This bit-interleaving property ensures that spatially proximate points in the multidimensional lattice map to nearby indices in the 1D sequence. Morton coding is computationally efficient, requiring only bit manipulation.

Morton-ordered serialization creates a 1D order in which any $2\times2$ (or $2\times2\times2$ in 3D) spatial block is guaranteed to map to a consecutive subsequence. This property underpins both local neighborhood preservation and efficient state-space or autoregressive model application at scale (Wang et al., 23 May 2025, Liu et al., 30 Nov 2025).

2. Long-Context Modeling in 2D: State-Space Models with Morton Scan

The MODEM framework for adverse weather image recovery exemplifies Morton serialization for 2D spatial sequences (Wang et al., 23 May 2025). The core, MOS2D (Morton-Order 2D-Selective-Scan Module), flattens each spatial feature grid $F\in\mathbb{R}^{H\times W\times C}$ using Morton order to yield a sequence $x_t$ amenable to sequential processing. This preserves the proximity of neighbors across boundaries better than raster scans, advantageous for state-based models that rely on local context.

A discrete state-space model (SSM) is then run along $t$ : $h_t = \bar A_t\,h_{t-1} + \bar B_t\,x_t \ y_t = C_t\,h_t + D\,x_t$ with $\bar A_t, \bar B_t$ dynamically conditioned by global and local degradation priors. The global descriptor $Z_0$ modulates coarse features, while a spatial kernel $Z_1$ provides local adaptation via attention-style modulation. Parameter matrices are adapted per step and discretized using zero-order hold, facilitating recurrent computation with context-aware gating.

Morton serialization enables the SSM recurrence to capture both short-range and long-range spatial dependencies: state updates propagate through adjacent spatial regions within $O(\log N)$ hops, compared to $O(W)$ for raster scans, effectively reducing the context path length and improving global context aggregation. The SSM complexity remains $O(NC^2)$ , with improved cache usage due to memory locality.

Quantitative ablations in MODEM show that inclusion of Morton scan improves PSNR by up to 0.32 dB across challenging datasets, and only the full configuration with Morton ordering and both global and local conditioning achieves the highest restoration benchmark performance (Wang et al., 23 May 2025).

3. Morton Serialization in 3D: Large-Scale Context for Gaussian Splatting Compression

In 3D Gaussian Splatting compression, the LocoMoco framework utilizes Morton serialization for scalable and spatially coherent long-context modeling (Liu et al., 30 Nov 2025). Each Gaussian centroid $\mu=(x,y,z)$ is quantized and sorted by its 1D Morton code $\pi$ . The sorted sequence is partitioned into nonoverlapping windows of size $L$ (default $L=1024$ ), where each window encompasses spatial neighbors, enabling blockwise context aggregation over $\mathcal{O}(10^3)$ primitives.

This 1D ordering facilitates efficient deployment of attention-based transform coders and autoregressive entropy models, both of which can now operate over large, locality-preserving context windows. Each block is processed by a positional encoder and self-attention layers; the resulting latents serve as inputs to downstream entropy coders that exploit not only channel-wise but space-aware dependencies, following a fine-grained space–channel order.

Ablation studies confirm that shrinking the context length below $L=1024$ (e.g., $L=128$ or $L=16$ ) incurs severe BD-Rate penalties (+7.97% and +27.18% respectively), while increases beyond $L=1024$ yield diminishing improvements (–1.79% at $L=2048$ ). This empirically substantiates the value of Morton-serialized large windows for long-range dependency capture in high-dimensional data (Liu et al., 30 Nov 2025).

4. Space-Channel Autoregressive Models with Morton-Ordered Context

LocoMoco further exploits Morton-ordering in its “space-channel” autoregressive entropy model. Within each window, the sequence of quantized Gaussians $\{n_i\}$ is factorized into channel subgroups and spatial anchor groups, with probability expressions: $p(\Theta) = \prod_{j=1}^J \prod_{i=1}^L p(n_i^j \mid \text{context})$ Contexts are aggregated using both channel attention over decoded subgroups and spatial attention within anchor groups.

The Morton ordering ensures that spatial neighborhood information is readily accessible and efficiently encoded within each window, supplying the autoregressive model with the necessary spatial correlations, crucial for both compression efficiency and faithful representation across scenes (Liu et al., 30 Nov 2025).

5. Conditioning and Adaptive Modulation in Morton-Ordered Models

Both MODEM and LocoMoco incorporate mechanisms for adaptive model conditioning, leveraging the structure provided by Morton serialization. In MODEM, global and local priors—extracted by the Dual Degradation Estimation Module (DDEM)—modulate (via DAFM and DSAM) both the states and transformations in the SSM. The global prior $Z_0$ affects all spatial positions, denoting degradation type and severity, while the spatially adaptive prior $Z_1$ encodes fine-grained, location-dependent attributes such as artifact structure.

In LocoMoco, attention-based transform coding yields latents that encode not only spatial geometry but also the context-relevant dependencies for downstream symbol probability estimation. Hyperpriors derived from these Transformer representations further guide the entropy coder, enhancing rate-distortion performance and compact representation.

Context-aware modulation steered by Morton-ordered sequences ensures that priors and attention are applied to meaningful spatial groupings, facilitating both local feature preservation and global structure modeling.

6. Empirical Performance and Generalizability

MODEM, utilizing Morton serialization within MOS2D, achieves state-of-the-art PSNR across multiple adverse weather restoration tasks: Snow100K-S (38.08 dB), Snow100K-L (32.52 dB), Outdoor (33.10 dB), RainDrop (33.01 dB), with quantifiable improvements over non-Morton baselines and clear ablation evidence for each component (see Table 6 in (Wang et al., 23 May 2025)).

LocoMoco attains a 20× compression ratio (from ~372 MB to ~19 MB at PSNR ≈29 dB) for 3D Gaussian Splatting data. It generalizes without per-scene fine-tuning, achieving BD-Rate reductions of –9.4% (Mip-NeRF 360) and –10.4% (Tanks & Temples) relative to FCGS, demonstrating state-of-the-art generalizable 3DGS compression (Liu et al., 30 Nov 2025). These results are traceable to the ability of Morton-ordered long-context windows to encode wide-range dependencies critical for complex spatial structures.

7. Computational and Practical Considerations

Morton code computation involves only $O(1)$ bitwise operations per coordinate and is highly cache efficient due to its spatial locality. In SSM-based and autoregressive models, the reduced path length between spatial neighbors expedites recurrence and attention propagation, leading to both algorithmic and hardware-level improvements (e.g., improved memory locality). The windowed processing paradigm enabled by Morton serialization allows models to scale efficiently to very large inputs or datasets, as demonstrated by batching $\mathcal{O}(10^6)$ Gaussians into manageable windows for feed-forward inference and compression (Liu et al., 30 Nov 2025).

A plausible implication is that Morton serialization's coupling of spatial coherence with 1D sequence modeling will facilitate further advances in long-context modeling for multidimensional data across restoration, compression, and representation learning tasks.

Markdown Report Issue Upgrade to Chat

References (2)

MODEM: A Morton-Order Degradation Estimation Mechanism for Adverse Weather Image Recovery (2025)

Feed-Forward 3D Gaussian Splatting Compression with Long-Context Modeling (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Long-Context Modeling via Morton Serialization.