Interactive Convolution Block (ICB) Overview
- Interactive Convolution Block (ICB) is a network module that employs parallel convolutions with nonlinear gating for context-aware, scale-adaptive feature fusion.
- ICBs integrate diverse mechanisms—multiplicative gating, mixer layers, and collision-inspired designs—to merge features dynamically across multiple paths.
- Their incorporation in CNN architectures yields measurable accuracy improvements and robust performance for image, audio, and time-series applications while balancing computational cost.
An Interactive Convolution Block (ICB) is a convolutional neural network (CNN) architectural module designed to improve feature representation and adaptive capacity by introducing data-dependent interactions between multi-scale or multi-branch convolutional features. In contrast to standard convolutional blocks that apply single-filter, single-path convolutions followed by simple nonlinearities (typically ReLU), ICBs systematically leverage parallel convolutions, nonlinear activations, element-wise gating, or global mixing to enable context-sensitive, nonlinear, and often cross-scale feature fusion. Used across image, audio, and sequence modeling, ICBs have appeared in several forms: as dual-branch nonlinear fusion modules (Ferdaus et al., 2024), as mixer-augmented conv blocks for efficient global pattern extraction (Ng et al., 2022), as time-series local interaction layers (Eldele et al., 2024), and as physics-inspired inter-layer collision structures (An et al., 2019). Their adoption has led to consistent, measurable performance improvements and enhanced robustness over classical implementations.
1. Core Architectures and Mathematical Formulations
ICBs exhibit several distinct architectural patterns, unified by the design principle of explicit feature interaction between parallel convolutional or nonlinear paths.
1.1. Multi-Branch Convolution with Multiplicative Gating
KANICE introduces an ICB where two convolutions with different kernel sizes (3×3 and 5×5) operate in parallel on the same input tensor , producing feature maps and after GELU activation. The block output is the element-wise product: This module promotes co-activation across scales; output is strongly gated unless both branch activations are high (Ferdaus et al., 2024).
1.2. Mixer-Augmented Depthwise Blocks
ConvMixer employs an ICB composed of:
- 2D depthwise-separable convolution (local spatial extraction),
- Followed by 1D depthwise-separable convolution (temporal/channel refinement),
- Followed by two MLP mixer layers that implement (a) temporal mixing and (b) frequency/channel mixing, each with GELU activation and LayerNorm,
- Residual connections encompassing both the input and intermediate branches.
The complete forward operator
where is post-2D conv, post-1D conv, is the mixer, enables both local and global interaction at efficient computational cost (Ng et al., 2022).
1.3. Interactive Gated Fusion for Time Series
TSLANet's ICB for time-series reconstructs local temporal detail lost in spectral-domain processing. It applies two parallel 1D convolutions of distinct kernel sizes to input , followed by dual interactive gates: This structure enables rich cross-scale and local feature mixing downstream of adaptive spectral blocks (Eldele et al., 2024).
1.4. Physics-Inspired Collision-Based Fusion
IC-Networks define an ICB using a collision term inspired by elastic collisions between layers. For an input , the IC layer merges direct convolution with a non-linear residual computed between the standard response and a "rough-feature" (per-channel sum or average) term: where is a depthwise separable convolution, a scalar gating parameter, and is ReLU (An et al., 2019).
2. Adaptive Feature Fusion and Nonlinear Capacity
ICBs commonly employ either multiplicative gating, interaction terms, or global MLP-based mixing to produce context- or input-dependent activations.
- Multiplicative Gating: Element-wise multiplication (Hadamard product) between branch activations enforces strict co-activation, increasing selectivity and enabling complex, higher-order interactions (Ferdaus et al., 2024, Eldele et al., 2024).
- Additive-Residual and Nonlinear Mixer: MLP-based mixer layers globally recombine tokens/channels, effectively substituting for attention at an order of magnitude lower computational cost (Ng et al., 2022).
- Collision Mechanism: The explicit "collision" nonlinearity can flexibly carve out up to three linear regions (standard ReLU neurons provide two), thus increasing local representational power (An et al., 2019).
These mechanisms contrast with the standard Conv–ReLU pipeline, which applies a single convolution and a fixed pointwise nonlinearity, lacking feature-dependent gating or cross-scale mixing.
3. Implementation Protocols and Computational Analysis
Parameter and FLOP Costs
| ICB Variant | Added Parameters (per block) | FLOPs Overhead | Activation Scheme |
|---|---|---|---|
| KANICE ICB | GELU, Hadamard | ||
| ConvMixer ICB | 100–143K for | 22.2M | Swish/GELU, Mixer |
| TSLANet ICB | Dominated by 1D convs; low | GELU, Gating | |
| IC-Network ICB | ReLU, collision |
- KANICE's ICB is heavier than a single 3×3 conv for the same input/output channels (Ferdaus et al., 2024).
- ConvMixer blocks scale to meet tight parameter/MAC budgets (100K param, 22.2M MACs for full KWS pipeline) and use depthwise/pointwise separable convolutions for efficiency (Ng et al., 2022).
- TSLANet and IC-Network ICBs can be configured with arbitrary kernel sizes to balance locality and cost (Eldele et al., 2024, An et al., 2019).
Initialization and Integration
- He (Kaiming) normal initialization is used for convolution weights in KANICE (Ferdaus et al., 2024).
- BatchNorm, GELU/Swish activations, and residual connections are employed variously across implementations to stabilize training and improve convergence (Ng et al., 2022, An et al., 2019).
- Dropout and explicit regularization are not used within the ICB itself in current KANICE experiments; standard weight decay is global (Ferdaus et al., 2024).
4. Empirical Performance and Ablation Analysis
ICB integration yields consistent, sometimes substantial, improvements in accuracy and robustness across application domains.
Image Classification (KANICE)
| Model | MNIST | Fashion | EMNIST | SVHN |
|---|---|---|---|---|
| CNN (std) | 98.55 | 92.36 | 85.38 | 84.04 |
| ICB only | 98.98 | 92.05 | 86.43 | 86.70 |
| ICB_CNN | 98.92 | 92.94 | 87.00 | 89.60 |
Replacing the first conv with an ICB increases accuracy by +0.43% (MNIST), +1.32% (EMNIST), and +2.66% (SVHN). "ICB_CNN", a hybrid pattern, performs even better, especially for challenging data (Ferdaus et al., 2024).
Speech/Audio Robustness (ConvMixer)
- Google Speech Commands V2-12 accuracy: 98.2% (<120K params).
- Far-field robustness (down to SNR –10 dB): 71.88% vs 64.5% for MatchboxNet of similar size.
- Removing the mixer from ICB produces a ~7% accuracy decline at low SNRs, confirming the global interaction's criticality (Ng et al., 2022).
Time Series Tasks (TSLANet)
| Variant | FordA (ACC%) | UWaveGL (ACC%) | ETTh1 (MSE) | Exchange (MSE) |
|---|---|---|---|---|
| w/o ICB | 91.3 | 86.2 | 0.419 | 0.376 |
| Full TSLANet | 93.1 | 91.3 | 0.413 | 0.369 |
ICB removal consistently reduces classification accuracy (–1.8% / –5.1%) and modestly worsens forecasting error (Eldele et al., 2024).
Large-Scale Image Recognition (IC-Network)
- Integrating IC blocks into ResNet-50 reduces top-1 error from 22.85% to 21.49% (ImageNet 10-crop), outperforming ResNet-101 with fewer FLOPs.
- On CIFAR-10, gains of 0.3–0.9% are observed; however, very deep ICNets may require additional regularization (An et al., 2019).
5. Contextualization Across Domains and Use Cases
ICBs have been adapted and validated across image classification (Ferdaus et al., 2024, An et al., 2019), keyword spotting in noisy environments (Ng et al., 2022), and time-series learning (Eldele et al., 2024). Patterns emerge:
- In CNNs: Multi-branch and collision-based ICBs enhance discrimination and convergence at modest computational cost.
- In audio/sequential processing: Mixer-based ICBs compete with transformer architectures while maintaining small footprints.
- In time series: ICBs restore local detail lost to global spectral blocks, improving classification and forecasting by dynamic, scale-aware nonlinearity.
The architectural modularity of ICBs makes them suitable replacements or supplements to both standard convolution and transformer FFN/attention layers in settings where cross-scale, nonlinear, or global-local feature fusion is critical.
6. Implementation Considerations and Limitations
- Architecture: ICBs may be implemented via standard 2D/1D convolutions, mixer MLPs, or collision-based fusion, depending on domain and desired behavior.
- Kernel size and parameter scaling must be chosen relative to task statistics (e.g. sequence length, local vs. global feature prominence) (Eldele et al., 2024).
- ICBs typically increase parameters and FLOPs (up to 4× per block for KANICE), but selective deployment (only in first layers or as hybrid blocks) offers a favorable performance/efficiency trade-off.
- Very deep integration (e.g., overly many IC bottleneck blocks) may cause overfitting and mandate stronger regularization schemes (An et al., 2019).
- Certain ICBs (e.g., TSLANet's) do not contain residual skips by default; further empirical work could illuminate optimal placement and structure for arbitrary domains (Eldele et al., 2024).
7. References
- "KANICE: Kolmogorov-Arnold Networks with Interactive Convolutional Elements" (Ferdaus et al., 2024)
- "ConvMixer: Feature Interactive Convolution with Curriculum Learning for Small Footprint and Noisy Far-field Keyword Spotting" (Ng et al., 2022)
- "TSLANet: Rethinking Transformers for Time Series Representation Learning" (Eldele et al., 2024)
- "IC-Network: Efficient Structure for Convolutional Neural Networks" (An et al., 2019)