Intra/Inter Secondary Transforms (IST)

Updated 13 January 2026

IST is a class of secondary transforms that refine primary block-wise or vectorized transforms through non-separable processing for enhanced energy compaction.
They employ learned non-separable kernels on selected low-frequency coefficients, boosting rate-distortion performance with minimal signaling overhead.
Adapted for both video coding and neuromorphic/transformer architectures, ISTs optimize intra-token and inter-token processing for energy-efficient computation.

Intra/Inter Secondary Transforms (IST) refer to a class of signal and neural network transformations that enhance primary block-wise or vectorized transforms by applying additional, typically non-separable, processing stages. Initially developed for video coding—where they target both intra-frame (spatial prediction) and inter-frame (temporal prediction) residuals—IST principles have also been adapted in neuromorphic and transformer-based AI, where they formalize a systematic distinction between operations within elements of a vector (intra-token) and across vectors in a sequence (inter-token). IST modules aim to improve energy compaction, contextual processing, or energy-efficient computation by introducing a secondary transform or mixing stage after a primary transform or embedding, but restricted to carefully targeted subspaces or axes.

1. Formal Definitions and Theoretical Structure

In video coding, ISTs are non-separable transforms applied after a separable primary transform (such as DCT or ADST) and before quantization. The process involves applying a learned orthonormal kernel only to a small support of low-frequency coefficients, typically in the upper-left block of residual data, to exploit residual correlations left by the primary transform while keeping arithmetic and signaling overhead minimal. Mathematically, for a primary-transformed block $R \in \mathbb{R}^{N\times N}$ , a mask operator $\mathcal{M}$ selects a vector $v \in \mathbb{R}^{N}$ containing the coefficients to which a kernel $K \in \mathbb{R}^{M \times N}$ is applied: $u = K v, \quad u \in \mathbb{R}^{M}$ Only $u$ is quantized; inverse IST reconstructs the vector by $v̂ = K^T \hat{u}$ and returns it into the original coefficient positions (Nalci et al., 6 Jan 2026, Pakiyarajah et al., 21 May 2025).

In neuromorphic AI and transformer-based architectures, ISTs are generalized to encompass intra-token (mixing channels/features within a token) and inter-token (mixing information across tokens) secondary transforms after the initial embedding. Intra-token transforms are typically channel-mixing operations such as pointwise feedforward or neural networks: $Y_{d,n} = f_{d,n}(X_{:,n})$ Inter-token transforms operate along the token dimension; for each channel,

$Y_{d,n} = g_{d,n}(X_{d,:})$

Both stages preserve the notion of a primary/secondary distinction, with the latter targeting refinement or higher-level abstraction (Simeone, 1 Jan 2026).

2. Motivation and Algorithmic Rationale

Primary transforms in video codecs (DCT, ADST, path-graph KLT) are separable and tailored to efficiently decorrelate pixels in smooth or structured blocks, but they are sub-optimal for residuals featuring non-axis-aligned or highly directional textures. Applying a full non-separable KLT is infeasible due to memory and compute costs. IST addresses this by overlaying a small, non-separable transform specifically trained (e.g., via PCA or cluster-based KLT) on the significant low-frequency subspace, thereby enhancing energy compaction and reducing bitrate (Nalci et al., 6 Jan 2026, Pakiyarajah et al., 21 May 2025).

In neuromorphic architectures and energy-efficient AI, a similar principle holds: the initial token embedding disperses features into a vector space; secondary (IST) transforms then refine intra-token structure (via spiking neural elements or feedforward subnets) and build or update context along the inter-token axis (via state-space recurrences, attention, or neuromorphic approximation of these) (Simeone, 1 Jan 2026).

3. Mathematical Formulation and Implementation

Video Coding IST (AV2/AVM/VVC)

Let $R$ be the $N \times N$ coefficient block post-primary transform. IST applies as follows:

Define a support mask $\mathcal{M}$ : selects the $N$ low-frequency coefficients.
Let $v = \mathcal{M}(R) \in \mathbb{R}^N$ , and apply learned $K \in \mathbb{R}^{M \times N}$ ( $M < N$ ): $u = K v$
$u$ is quantized and encoded; on decode, obtain $\hat{u}$ , reconstruct $v̂ = K^T \hat{u}$ , and insert back via $\mathcal{M}^{-1}(v̂)$ .
$K$ is cluster- or mode-dependent, trained via offline residual statistics (e.g., non-separable KLT on cluster data) and orthonormalized such that $K K^T = I_M$ (Nalci et al., 6 Jan 2026).

Signal syntax differs between intra- and inter-modes:

Intra IST: several kernel sets, signaled via set and kernel indices.
Inter IST: a single kernel set, only signal the kernel index.

Memory and compute costs are tightly bounded. For $8 \times 8$ blocks or larger, the dominant kernel sizes are $32 \times 48$ (DCT) or $20 \times 48$ (ADST) with under 15 multiplies per pixel in the worst-case, and no FFT-style acceleration is required (Nalci et al., 6 Jan 2026).

Neuromorphic/Transformer IST

After embedding $E \in \mathbb{R}^{D_{\text{emb}} \times N}$ :

Intra-token (per-token channel mixing):

$Y_{d,n} = f_{d,n}(X_{:,n})$

Realized as spiking neural networks (SNNs) with leaky integrate-and-fire (LIF) neurons or feedforward subnets.

Inter-token (across-token position mixing):

$Y_{d,n} = g_{d,n}(X_{d,:})$

Realized as SSMs (state-space models), softmax self-attention, or neuromorphic approximations thereof using spike-based coincidence or stochastic codes.

A schematic architecture places intra-token SNNs/FFNs before and after an inter-token (SSM or self-attention) module within each layer (Simeone, 1 Jan 2026).

4. Rate-Distortion and Coding Efficiency

Joint optimization of primary and IST kernels via rate-distortion objectives is central. In AVM codec experiments, joint clustering and separate path-graph transform (SPGT) design yielded lowest total RD cost. Explicit improvements are:

$8\times8$ blocks (mean over 12 intra modes): up to $-7.56\%$ BD-rate savings using joint/SPGT IST over a DCT/ADST-only baseline.
$16\times16$ blocks: up to $-7.82\%$ with joint/SPGT IST (Pakiyarajah et al., 21 May 2025). In AV2, IST alone produces BD-rate reductions of $-3.85\%$ (PSNR, all-intra natural video), $-1.76\%$ (random access), and $-1.09\%$ (low-delay), making it the largest contributor among residual-related tools (Nalci et al., 6 Jan 2026).

IST integration yields only minimal run-time overhead; all heavy optimization and kernel learning occurs offline, with negligible signaling (2 bits for primary transform, 1 bit for IST apply or bypass) (Pakiyarajah et al., 21 May 2025).

5. Comparative Analysis in Neuromorphic and Transformer Models

IST divides post-embedding processing into intra-token (FFN/SNN, independent per token) and inter-token (attention/SSM, context-building) axes. Intra-token SNNs, operating over a virtual time axis for each token, support highly sparse and energy-efficient computation, approaching theoretical improvements of 30–100× in certain tasks (e.g., keyword spotting, image classification) versus dense GPU approaches. Inter-token modules enable online context integration with O(N) or O(N²⁾ cost, with sparse attention or SSMs enabling further energy reduction. Both are compatible with surrogate-gradient training or local/plasticity rules, maintaining >95% accuracy relative to dense transformer or SSM baselines while reducing computational overhead by more than an order of magnitude (Simeone, 1 Jan 2026).

6. Extensions, Application Domains, and Relationship to Prior Art

ISTs in video coding generalize and improve upon prior non-separable transforms such as NSST (HEVC) and LFNST (VVC) by allowing efficient inclusion in both intra- and inter-modes, restricting support to low-frequency coefficients, and minimizing signaling/compute overhead. AV1 (predecessor of AV2) did not deploy secondary transforms, representing a substantial step forward in AV2 (Nalci et al., 6 Jan 2026).

In neuromorphic AI, distinction and reflection of intra/inter IST correspond to advances in spiking transformers (Spikformer, SpikeGPT) and SSM-based SNNs (P-SpikeSSM), supporting application to sequence modeling, language, and energy-constrained domains. ISTs, as modular processing units, provide a scalable template for balancing energy use, context integration, and representational flexibility—central desiderata in next-generation AI hardware and software (Simeone, 1 Jan 2026).

7. Summary Table: IST Variants in Video Coding

Context	IST Support Size	Kernel Shape	Application Mode(s)
TB < 8×8	16 coeffs	8×16	Intra/Inter (AV2)
TB ≥ 8×8, DCT-2	48 coeffs	32×48	Intra/Inter (AV2)
TB ≥ 8×8, ADST	48 coeffs	20×48	Intra (AV2)

ISTs are now established as a core design pattern for improving compaction and energy efficiency both in video codecs and in AI hardware/software, with mature methodologies for kernel learning, signaling, mode selection, and integration into block-by-block pipelines (Nalci et al., 6 Jan 2026, Pakiyarajah et al., 21 May 2025, Simeone, 1 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (3)

Transform and Entropy Coding in AV2 (2026)

Joint Optimization of Primary and Secondary Transforms Using Rate-Distortion Optimized Transform Design (2025)

Modern Neuromorphic AI: From Intra-Token to Inter-Token Processing (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Intra/Inter Secondary Transforms (IST).

Intra/Inter Secondary Transforms (IST)

1. Formal Definitions and Theoretical Structure

2. Motivation and Algorithmic Rationale

3. Mathematical Formulation and Implementation

Video Coding IST (AV2/AVM/VVC)

Neuromorphic/Transformer IST

4. Rate-Distortion and Coding Efficiency

5. Comparative Analysis in Neuromorphic and Transformer Models

6. Extensions, Application Domains, and Relationship to Prior Art

7. Summary Table: IST Variants in Video Coding

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Intra/Inter Secondary Transforms (IST)

1. Formal Definitions and Theoretical Structure

2. Motivation and Algorithmic Rationale

3. Mathematical Formulation and Implementation

Video Coding IST (AV2/AVM/VVC)

Neuromorphic/Transformer IST

4. Rate-Distortion and Coding Efficiency

5. Comparative Analysis in Neuromorphic and Transformer Models

6. Extensions, Application Domains, and Relationship to Prior Art

7. Summary Table: IST Variants in Video Coding

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research