SConvTransform: Spherical, Compiler, and Equivariant Methods

Updated 29 November 2025

SConvTransform is a family of mathematically principled convolution transforms that address spherical data, compiler-level optimizations, and equivariant deep learning.
It achieves efficiency through diagonal harmonic space operations, optimized cache tiling, and steerable self-attention mechanisms.
Empirical studies demonstrate significant performance boosts in Earth topography filtering, MLIR convolution lowering, and enhanced equivariant model accuracy.

SConvTransform refers to a family of methodologies and operator definitions in contemporary computational science, encompassing (i) a spherical convolution transform (the “sifting convolution”) for data on the sphere, (ii) compiler-level sliced convolution lowering for efficient direct convolution, and (iii) steerable transformer architectures that integrate steerable convolutions and group-equivariant attention. Each instantiation targets a distinct technical context—geometric signal analysis, MLIR/LLVM compilation, and equivariant deep learning on $\mathrm{SE}(d)$ —while sharing the pursuit of structural efficiency and mathematical fidelity.

1. Sifting Convolution on the Sphere

The sifting convolution, also denoted as the SConvTransform, is a spherical convolution operation characterized by the use of a sifting or translation operator grounded in the harmonic structure of the 2-sphere $S^2$ (Roddy et al., 2020). For square-integrable functions $f,g\in L^2(S^2)$ , the sifting convolution is defined as

$(f \star_s g)(\omega_0) = \langle \mathcal{T}_{\omega_0} f,\, g\rangle_{L^2(S^2)} = \int_{S^2} d\Omega(\omega)\;(\mathcal{T}_{\omega_0}f)(\omega) g^*(\omega).$

Here, the translation operator $\mathcal{T}_{\omega_0}$ is analogized from the Euclidean case through its action in harmonic space:

$(\mathcal{T}_{\omega_0} Y_{\ell m})(\omega) = Y_{\ell m}(\omega_0)\,Y_{\ell m}(\omega),$

extended linearly. The corresponding harmonic space representation of the convolution is a diagonal product:

$(f \star_s g)_{\ell m} = f_{\ell m}\,g_{\ell m}^*.$

This diagonalization in $(\ell, m)$ coefficients makes the sifting convolution computationally efficient, requiring only two spherical harmonic transforms and a pointwise multiplication, with total cost $O(L^3)$ for bandlimit $L$ .

Key properties of this construction include the ability to use fully directional kernels (no axisymmetry restriction), outputs indexed by $S^2$ (not $SO(3)$ ), and exact commutation modulo conjugation, i.e., $f \star_s g = [g \star_s f]^*$ . This framework enables anisotropic filtering on the sphere, as demonstrated by directional harmonic-Gaussian smoothing of Earth topography.

Relative to other spherical convolutions, only the sifting convolution supports arbitrary kernel directionality, output on $S^2$ , and strict $O(L^3)$ complexity, thus establishing a unique niche for spherical data analysis applications (Roddy et al., 2020).

2. Compiler-Guided Sliced Convolution in MLIR (SConvTransform Operator)

In the domain of machine learning compilation, SConvTransform designates a declarative Transform-dialect operator for optimizing 2D convolutions within MLIR, adhering to a fully analyzable pipeline (Ferrari et al., 22 Nov 2025). The main operation, SConvOp, lowers a high-level linalg.conv2d operation into a tiled, packed, and bufferized sequence through the following pipeline:

Convolution normalization and generic op legalization—pattern-matching and collapsing spatial loops.
Convolution Slicing Analysis (CSA)—analytically computes tile sizes $(N_c, K_2, K_3)$ for reduction-channel, output-channel, and linearized window dimensions, targeting L1/L2/L3 capacities using:

$N_c \leq \left\lfloor \frac{L1_{\text{bytes}}}{\text{sizeof(float)}\cdot(F_h\cdot F_w\cdot N_{\text{win}} + F_h\cdot F_w\cdot N_f)} \right\rfloor$

and analogous expressions for $K_2$ and $K_3$ for deeper cache levels.

Edge-case splitting—remainder kernels are created as subkernels and handled with affine-map adjustments.
Two-level structured tiling—outer-level for cache blocking, inner-level for microkernel exposure, using MLIR’s scf::tileUsingSCF and related dialect constructs.
Packing and multipacking—affine equations specify filter and input reordering for maximal hardware utilization.
Microkernel lowering—bufferized ops mapped to BLAS or custom microkernels at LLVM IR emission.

The process remains agnostic to target architecture except for explicit tile and vector sizes, which are encapsulated in user-supplied ArchInfo and MicroKernelInfo attributes. Experiments across ARM SME, Intel AVX512, and IBM POWER10 platforms show up to $60\%$ and $67\%$ of peak performance, respectively; this validates the value of combining static schedule analysis with structure-preserving packing (Ferrari et al., 22 Nov 2025).

3. Convolution Slicing Analysis, Optimization, and Packing Strategies

SConvTransform implementations in compiler pipelines are centered on three interlocking strategies (Ferrari et al., 2023, Ferrari et al., 22 Nov 2025):

Convolution Slicing Analysis (CSA): Provides analytic tile-size selection along the key tensor axes to optimize for cache reuse and minimal DRAM traffic. A cost model selects between input-stationary (IS) and weight-stationary (WS) scheduling based on symbolic cache-miss and bandwidth minimization.
Convolution Slicing Optimization (CSO): Emits a multi-deep loop nest, with cache-aligned tiling, dynamic on-demand packing, and microkernel calls. This structure can be expressed either in C/C++ or directly as MLIR loop nests and is compatible with scf.for and other dialect-level control flow.
Vector-Based Packing (VBP): For unit-stride convolutions, efficient packing is achieved through vector register shift operations (e.g., VSX vsldoi on POWER10, AVX-512 _mm512_alignr_epi32 on x86), greatly reducing packing overhead by avoiding repeated loads and redundant memory traffic.

In compiler toolchains such as ONNX-MLIR, these passes occur after convolution operation legalization and before final backend lowering. Integration with runtime libraries enables coupling to optimized BLAS or custom ISA-specific microkernels. Reported empirical results include model-inference speedups of $9\%$ – $25\%$ (x86), $10\%$ – $42\%$ (POWER10), and packing-time reductions up to $7.2\times$ relative to Im2Col-based baselines (Ferrari et al., 2023).

4. Steerable SConvTransform Architectures in Equivariant Deep Learning

SConvTransform also identifies a class of steerable transformer networks operating on volumetric or manifold data with explicit group symmetry, particularly over $SE(2)$ and $SE(3)$ (Kundu et al., 24 May 2024). These architectures interleave steerable convolutional blocks with transformer-style self-attention acting on Fourier-space features corresponding to irreducible representations (irreps) of $SO(d)$ .

Key Elements:

Steerable Feature Maps: Functions $f^{(\ell)}\!: \mathbb{R}^d \times SO(d) \to \mathbb{C}$ , equivariant under $SE(d)$ rigid motions.
Fourier-space Representation: $f(x, \rho) = \int_{SO(d)} f(x, R) \rho(R)\,d\mu(R)$ , with convolutional processing as pointwise matrix multiplications in the $\rho$ -indexed channel.
Equivariant Attention: Queries, keys, and values are derived for each $\rho$ by learned embeddings, and attention weights are computed with steerable positional encodings (in 2D, $P(x,k) = \phi(r,k) e^{-i k \theta}$ ; in 3D, $P(x,\ell) = \phi(r,\ell) Y^\ell(\theta, \phi)$ ).
Equivariant Nonlinearities: CG-nonlinearity (using Clebsch–Gordan decompositions) and H-nonlinearity (magnitude-based activation).

Empirical studies on Rotated MNIST and ModelNet10 demonstrate that hybrid architectures with SConvTransform attention out-perform or match the state-of-the-art, achieving accuracy improvements of $0.2\%$ – $0.3\%$ over comparable steerable CNNs, with robust performance under full $SE(d)$ action on input data (Kundu et al., 24 May 2024). These gains are attained with manageable model size (e.g., $99.0\%$ accuracy with $2.2$M parameters at $k=8$ for Rotated MNIST).

5. Comparative Analysis and Context

The term SConvTransform encapsulates distinct but related innovations, each contributing to the state-of-the-art within its technical frame:

Context	Primary Innovation	Reference	Key Distinction
Spherical Signal Proc.	Diagonal, direction-preserving convolution	(Roddy et al., 2020)	Only construction enabling general directional kernels on $S^2$
Compiler Optimization	MLIR/LLVM-level cache- and ISA-aware convolution	(Ferrari et al., 22 Nov 2025, Ferrari et al., 2023)	End-to-end pipeline with explicit schedule, tiling, and affine packing
Equivariant DL	SE(d)-equivariant self-attention with steerable convs	(Kundu et al., 24 May 2024)	Equivariant transformer integrating SO(d) Fourier structure

For spherical data, SConvTransform provides the only $O(L^3)$ , directionally-general convolution with $S^2$ output. In compiler optimization, it delivers quantifiable inference speedups and packing overhead reductions. In group-equivariant deep learning, the SConvTransform yields measurable improvements in accuracy through global self-attention on steerable features. The term thus denotes a class of methodologically rigorous, structurally efficient, and mathematically principled convolutional transforms or implementations across domains.

6. Extensibility and Future Prospects

The modular design of SConvTransform in compiler-based frameworks supports the incorporation of new microkernel backends, vectorized streaming packing, deeper nested tiling, and advanced convolution types (e.g., depthwise/grouped, fused ops, Winograd/AMX) with minimal disruption (Ferrari et al., 22 Nov 2025). For the spherical variant, applications in harmonic analysis, anisotropic filtering, and spherical wavelet constructions are immediate. In equivariant networks, further development may integrate more sophisticated learnable positional embeddings, deeper hierarchies, and extension to other symmetry groups.

This synthesis is based strictly on primary literature as identified above; all technical claims and empirical results are cited from the original papers.