Frequency Component Exchange KAN
- Frequency Component Exchange KAN (FCEKAN) is a neural module that applies Fourier decomposition to exchange frequency components between RGB and infrared imagery.
- It leverages the Kolmogorov–Arnold Network formalism to explicitly swap sine and cosine feature components, ensuring complementary cross-modal reconstruction.
- FCEKAN integrates with the SFFR framework using learnable coefficients and gating mechanisms to optimize spatial-frequency fusion for robust aerial object detection.
The Frequency Component Exchange KAN (FCEKAN) is a neural module developed for multispectral aerial object detection within the SFFR (Spatial and Frequency Feature Reconstruction) framework. FCEKAN leverages the Kolmogorov–Arnold Network (KAN) formalism to represent and exchange frequency components between RGB and infrared (IR) imagery, enabling cross-modal reconstruction of complementary features prior to spatial-frequency fusion. Its primary innovation is the explicit, learnable exchange of Fourier-decomposed feature components at selected frequency bands between modalities, augmenting semantic consistency and enhancing detection robustness across varying conditions.
1. Kolmogorov–Arnold Network Foundations
FCEKAN is grounded in the Kolmogorov–Arnold representation theorem, which expresses any continuous multivariate mapping as a sum of compositions of univariate "inner" functions and "outer" functions : Within KANs, each network layer parameterizes a suite of edge-wise, one-dimensional activations: yielding a stack of functionally composed operators: Edge activations are detailed as linear combinations of (the sigmoid-weighted linear unit) and B-spline expansions, each with learnable weights and coefficients.
2. Frequency-Domain Expansion and Representation
After preliminary backbone feature extraction, modality-specific feature maps and of shape are flattened to patch locations, denoted for RGB and for IR, with . FCEKAN applies a "FourierKAN" intra-modal expansion, projecting each feature to a sine/cosine basis: where frequencies , and coefficients are optimized per channel and frequency. This allows explicit, differentiable decomposition of feature responses into spatial frequency components.
3. Selective Frequency Component Exchange Mechanism
At the core of FCEKAN is selective exchange of frequency bands between modalities. The system partitions the available frequency indices , typically into low (coarse structure) and high (edge/texture) sub-bands. The module swaps Fourier components of one modality with those of the other in selected bands. Empirically, optimal performance was observed upon global swapping: all cosine terms from RGB were fused with all sine terms from IR, yielding the cross-modal expansion: This encourages the network to reconstruct missing or weak semantic cues using cross-modal spectral information—e.g., high-frequency RGB texture replaced by IR structural signatures under poor illumination.
4. Learnable Parameters, Gates, and Cross-Branch Integration
FCEKAN's primary learnable parameters are the Fourier basis coefficient tensors and . While the frequency exchange pattern can be controlled via binary or soft gates , the published strategy utilizes a fixed global swap. To merge frequency-domain (FCEKAN) and spatial-domain (MSGKAN) outputs, the SFFR head linearly combines them with scalar gates , which are initialized uniformly and optimized via the detection objective: govern the relative contributions of spatial and frequency features to the fused representation.
5. Post-Exchange Fusion and Final Feature Construction
The outputs , are further processed within a KANFusion block, which normalizes and mixes these with the original backbone features. This is achieved through learnable scalars for both the enhancement and residual paths, a subsequent MLP, and residual addition: analogously for the IR branch. This multi-stage mixing imparts flexibility in how much cross-modal and original information is retained in downstream processing.
6. Design Rationales and Cross-Modal Semantics
Low-frequency image components fundamentally encode object shape and silhouette; high frequencies address local detail, edges, and texture. RGB imagery is typically richer in high-frequency content under favorable imaging conditions, while IR captures persistent low-frequency structural elements regardless of illumination. FCEKAN’s selective exchange mechanism enables complementary feature recovery, particularly under challenging acquisition circumstances or occlusions, by reconstructing missing frequency bands from the alternate modality. The trainable sine/cosine mixing provided by the KAN framework confers nonlinear modeling capacity to the exchanged representations.
7. Algorithmic Outline and Integration into SFFR Framework
The operational procedure for FCEKAN expands as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
for i in range(N): # spatial indices for j in range(d): # channels for k in range(g): # frequencies C_R = cos(k * M_R[i,j]) S_R = sin(k * M_R[i,j]) C_T = cos(k * M_T[i,j]) S_T = sin(k * M_T[i,j]) # Selective exchange C_out[i,j,k] = C_R * a[j,k] if not exchange_C(k) else C_T * a[j,k] S_out[i,j,k] = S_T * b[j,k] if not exchange_S(k) else S_R * b[j,k] M'_R[i,j] = sum([C_out[i,j,k] + S_out[i,j,k] for k in range(g)]) # Analogous for M'_T swapping roles of R and T return M'_R, M'_T |
-scaled FCEKAN outputs and -scaled MSGKAN spatial features are linearly combined, yielding final modality representations for subsequent KANFusion and downstream detection (Zuo et al., 9 Nov 2025). FCEKAN thus constitutes a pivotal module for explicit frequency-based cross-modal feature synthesis within modern multispectral aerial perception architectures.