Papers
Topics
Authors
Recent
2000 character limit reached

Frequency Component Exchange KAN

Updated 16 November 2025
  • Frequency Component Exchange KAN (FCEKAN) is a neural module that applies Fourier decomposition to exchange frequency components between RGB and infrared imagery.
  • It leverages the Kolmogorov–Arnold Network formalism to explicitly swap sine and cosine feature components, ensuring complementary cross-modal reconstruction.
  • FCEKAN integrates with the SFFR framework using learnable coefficients and gating mechanisms to optimize spatial-frequency fusion for robust aerial object detection.

The Frequency Component Exchange KAN (FCEKAN) is a neural module developed for multispectral aerial object detection within the SFFR (Spatial and Frequency Feature Reconstruction) framework. FCEKAN leverages the Kolmogorov–Arnold Network (KAN) formalism to represent and exchange frequency components between RGB and infrared (IR) imagery, enabling cross-modal reconstruction of complementary features prior to spatial-frequency fusion. Its primary innovation is the explicit, learnable exchange of Fourier-decomposed feature components at selected frequency bands between modalities, augmenting semantic consistency and enhancing detection robustness across varying conditions.

1. Kolmogorov–Arnold Network Foundations

FCEKAN is grounded in the Kolmogorov–Arnold representation theorem, which expresses any continuous multivariate mapping f(x1,,xn)f(x_1, \ldots, x_n) as a sum of compositions of univariate "inner" functions ϕ\phi and "outer" functions Φ\Phi: f(x1,,xn)=q=12n+1Φq ⁣(p=1nϕq,p(xp)).f(x_1,\dots,x_n) = \sum_{q=1}^{2n+1} \Phi_q\!\left(\sum_{p=1}^n \phi_{q,p}(x_p)\right). Within KANs, each network layer \ell parameterizes a suite of edge-wise, one-dimensional activations: Φ()={ϕq,p()}q=1..2n+1,  p=1..n,\Phi^{(\ell)} = \{\phi^{(\ell)}_{q,p}\}_{q=1..2n+1,\;p=1..n}, yielding a stack of LL functionally composed operators: KAN(x)=(Φ(L1)...Φ(0))(x).\mathrm{KAN}(x) = (\Phi^{(L-1)} \circ ... \circ \Phi^{(0)})(x). Edge activations are detailed as linear combinations of Silu(x)\mathrm{Silu}(x) (the sigmoid-weighted linear unit) and B-spline expansions, each with learnable weights and coefficients.

2. Frequency-Domain Expansion and Representation

After preliminary backbone feature extraction, modality-specific feature maps FRGBF_{RGB} and FIRF_{IR} of shape B×C×H×WB \times C \times H \times W are flattened to N=BHWN = B \cdot H \cdot W patch locations, denoted MRRN×dM_R \in \mathbb{R}^{N \times d} for RGB and MTRN×dM_T \in \mathbb{R}^{N \times d} for IR, with d=Cd = C. FCEKAN applies a "FourierKAN" intra-modal expansion, projecting each feature to a sine/cosine basis: ffourier(Mi)=j=1dk=1g[cos(kMi,j)aj,k+sin(kMi,j)bj,k],f_\mathrm{fourier}(M_i) = \sum_{j=1}^{d}\sum_{k=1}^{g}\left[\cos(k M_{i,j})a_{j,k} + \sin(k M_{i,j})b_{j,k}\right], where frequencies kgk \leq g, and coefficients aj,k,bj,ka_{j,k},b_{j,k} are optimized per channel and frequency. This allows explicit, differentiable decomposition of feature responses into spatial frequency components.

3. Selective Frequency Component Exchange Mechanism

At the core of FCEKAN is selective exchange of frequency bands between modalities. The system partitions the available frequency indices {1,,g}\{1, \ldots, g\}, typically into low (coarse structure) and high (edge/texture) sub-bands. The module swaps Fourier components of one modality with those of the other in selected bands. Empirically, optimal performance was observed upon global swapping: all cosine terms from RGB were fused with all sine terms from IR, yielding the cross-modal expansion: fCross(MR,MT)=j=1dk=1g[cos(kMR,j)aj,k+sin(kMT,j)bj,k].f_{\mathrm{Cross}}(M_R, M_T) = \sum_{j=1}^{d}\sum_{k=1}^{g}\left[\cos(k M_{R,j})a_{j,k} + \sin(k M_{T,j})b_{j,k}\right]. This encourages the network to reconstruct missing or weak semantic cues using cross-modal spectral information—e.g., high-frequency RGB texture replaced by IR structural signatures under poor illumination.

4. Learnable Parameters, Gates, and Cross-Branch Integration

FCEKAN's primary learnable parameters are the Fourier basis coefficient tensors aj,ka_{j,k} and bj,kb_{j,k}. While the frequency exchange pattern can be controlled via binary or soft gates G(k)[0,1]G(k)\in[0,1], the published strategy utilizes a fixed global swap. To merge frequency-domain (FCEKAN) and spatial-domain (MSGKAN) outputs, the SFFR head linearly combines them with scalar gates α,β\alpha,\beta, which are initialized uniformly and optimized via the detection objective: MR=αConv(fgus(MR))+βfCross(MR,MT), MT=αConv(fgus(MT))+βfCross(MT,MR).\begin{aligned} M'_R &= \alpha\,\mathrm{Conv}\big(f_{\mathrm{gus}}(M_R)\big) + \beta\,f_{\mathrm{Cross}}(M_R, M_T),\ M'_T &= \alpha\,\mathrm{Conv}\big(f_{\mathrm{gus}}(M_T)\big) + \beta\,f_{\mathrm{Cross}}(M_T, M_R). \end{aligned} α,β\alpha,\beta govern the relative contributions of spatial and frequency features to the fused representation.

5. Post-Exchange Fusion and Final Feature Construction

The outputs MRM'_R, MTM'_T are further processed within a KANFusion block, which normalizes and mixes these with the original backbone features. This is achieved through learnable scalars for both the enhancement and residual paths, a subsequent MLP, and residual addition: ZR=LayerNorm(aXR+bYR), UR=MLP(ZR), OR=cXR+dUR,\begin{aligned} Z_R &= \mathrm{LayerNorm}(a X_R + b Y_R), \ U_R &= \mathrm{MLP}(Z_R), \ O_R &= c X_R + d U_R, \end{aligned} analogously for the IR branch. This multi-stage mixing imparts flexibility in how much cross-modal and original information is retained in downstream processing.

6. Design Rationales and Cross-Modal Semantics

Low-frequency image components fundamentally encode object shape and silhouette; high frequencies address local detail, edges, and texture. RGB imagery is typically richer in high-frequency content under favorable imaging conditions, while IR captures persistent low-frequency structural elements regardless of illumination. FCEKAN’s selective exchange mechanism enables complementary feature recovery, particularly under challenging acquisition circumstances or occlusions, by reconstructing missing frequency bands from the alternate modality. The trainable sine/cosine mixing provided by the KAN framework confers nonlinear modeling capacity to the exchanged representations.

7. Algorithmic Outline and Integration into SFFR Framework

The operational procedure for FCEKAN expands as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
for i in range(N):      # spatial indices
    for j in range(d):  # channels
        for k in range(g): # frequencies
            C_R = cos(k * M_R[i,j])
            S_R = sin(k * M_R[i,j])
            C_T = cos(k * M_T[i,j])
            S_T = sin(k * M_T[i,j])
            # Selective exchange
            C_out[i,j,k] = C_R * a[j,k] if not exchange_C(k) else C_T * a[j,k]
            S_out[i,j,k] = S_T * b[j,k] if not exchange_S(k) else S_R * b[j,k]
        M'_R[i,j] = sum([C_out[i,j,k] + S_out[i,j,k] for k in range(g)])
        # Analogous for M'_T swapping roles of R and T
return M'_R, M'_T

β\beta-scaled FCEKAN outputs and α\alpha-scaled MSGKAN spatial features are linearly combined, yielding final modality representations for subsequent KANFusion and downstream detection (Zuo et al., 9 Nov 2025). FCEKAN thus constitutes a pivotal module for explicit frequency-based cross-modal feature synthesis within modern multispectral aerial perception architectures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Frequency Component Exchange KAN (FCEKAN).