Papers
Topics
Authors
Recent
Search
2000 character limit reached

Filter Skeleton (FS) in Neural Networks

Updated 22 February 2026
  • Filter Skeleton (FS) is a learnable scaling matrix that modulates individual sub-filter weights to enable dynamic pruning and refined feature aggregation in neural architectures.
  • It enables stripe-wise pruning in CNNs and Gaussian-based topology adjustment in GCNs, offering fine-grained control over parameter sparsity and structural refinement.
  • Empirical results show that FS methods achieve significant compression with minimal accuracy loss, facilitating hardware-efficient deployment and robust performance.

A filter skeleton (FS) is a learnable, structured scaling matrix integrated into neural architectures to modulate feature aggregation and enable dynamic pruning or selection of local structures within a high-dimensional filter or graph. The FS concept addresses the challenge of achieving fine-grained control over parameter sparsity or topological refinement while maintaining inference efficiency and strong empirical performance. It is central to both Stripe-Wise Pruning (SWP) for convolutional neural networks (CNNs), where FSs enable sub-filter granularity compression, and to Gaussian-based skeletal topology refinement in advanced gated graph convolutional networks (GCNs), where FS mechanisms regularize feature aggregation on skeleton graphs.

1. Mathematical Definitions and Core Mechanisms

In the context of convolutional neural networks, consider a convolutional layer ll with weight tensor WlRN×C×K×KW^l \in \mathbb{R}^{N \times C \times K \times K}, where NN is the number of output filters, CC is the input channel dimension, and KK the spatial kernel width. Each filter of shape C×K×KC \times K \times K is decomposed into K2K^2 stripes—1×1 spatial sub-filters indexed by position (i,j)(i,j) and associated vector fi,jRCf_{i,j} \in \mathbb{R}^C. The filter skeleton IlRN×K×KI^l \in \mathbb{R}^{N \times K \times K} assigns a single scalar to each stripe within each filter.

The effective weights for layer ll during forward propagation are given by: Wn,c,i,jl=In,i,jlWn,c,i,jlW'^{l}_{n,c,i,j} = I^l_{n,i,j} \cdot W^l_{n,c,i,j} This multiplicative modulation enables dynamic attenuation or enhancement of individual stripes inside conventional filters.

On graph-structured data, as in G³CN for skeleton-based action recognition, a skeleton's topological adjacency is refined through a "Gaussian filter skeleton." The joint distance matrix DRn×nD \in \mathbb{R}^{n \times n} encodes shortest-path distances between graph nodes (joints); a Gaussian kernel ϕij=exp(dij2)\phi_{ij} = \exp(-d_{ij}^2) provides a soft, spatial weighting function. Integration of these filter skeletons yields a topology-refined adjacency matrix used for downstream gated aggregation: Coeij=k=1nϕkjaik\mathrm{Coe}_{ij} = \sum_{k=1}^{n} \phi_{kj} \cdot a'_{ik} with further learned normalization and projection steps to complete the skeleton-based correction of adjacency.

2. Filter Skeleton Learning and Sparsification

Learning of the FS involves optimization over both the standard weights and the skeleton entries. In SWP, an 1\ell_1 penalty is imposed on each IlI^l, encouraging sparsity: Ltotal=(x,y)Loss(f(x;{WlIl}),y)+αl=1LIl1L_{\mathrm{total}} = \sum_{(x,y)} \mathrm{Loss}(f(x; \{ W^l \odot I^l \}), y) + \alpha \sum_{l=1}^L \|I^l\|_1 The FS parameters act as a differentiable mask, with gradients computed as: LWn,c,i,jl=In,i,jlLWn,c,i,jl\frac{\partial L}{\partial W^l_{n,c,i,j}} = I^l_{n,i,j} \cdot \frac{\partial L}{\partial W'^l_{n,c,i,j}}

LIn,i,jl=c=1CWn,c,i,jlLWn,c,i,jl\frac{\partial L}{\partial I^l_{n,i,j}} = \sum_{c=1}^C W^l_{n,c,i,j} \cdot \frac{\partial L}{\partial W'^l_{n,c,i,j}}

After optimization, all stripes with In,i,jl<δI^l_{n,i,j} < \delta are pruned (frozen to zero), and the FS can be merged into the weights for deployment efficiency.

In Gaussian skeleton graph models, the FS learning is implicit in the propagation of gradients through analytically defined Gaussian kernels and through the end-to-end training of adjacency correction parameters.

3. Comparison with Classical Pruning and Aggregation Methods

Traditional filter pruning in CNNs removes entire filters (dimension C×K×KC \times K \times K per filter), sacrificing both flexibility and representational capacity. The FS mechanism in SWP increases pruning granularity from $1$ to K2K^2 per filter, allowing intermediate retention of salient spatial substructures. This K²-fold increase in pruning resolution outperforms both group-wise and channel-wise approaches, neither of which adaptively learns individualized sub-filter shapes.

Similarly, in the G³CN framework, classical GCNs aggregate information using a fixed skeletal graph or uniformly parameterized adjacency matrices. Gaussian filter skeletons refine these topologies with learnable, sample-adaptive spatial weights, effectively "sculpting" the receptive field for each action class, and yielding superior differentiation of ambiguous human actions.

4. Algorithmic Workflows

A canonical procedure using FS in SWP for CNN pruning involves:

  1. Initialization of Il=1I^l = \mathbf{1} for all layers.
  2. Joint training of WlW^l and IlI^l with sparsity regularization.
  3. Pruning of stripes (n,i,j)(n,i,j) where In,i,jl<δI^l_{n,i,j} < \delta.
  4. Merging of IlI^l into WlW^l (i.e., WlWlIlW^l \leftarrow W^l \odot I^l) and discarding IlI^l.
  5. Optional light fine-tuning for recovery of potential accuracy loss.
  6. Inference is conducted via standard convolution, omitting pruned stripes.

For G³CN, the block structure alternates temporal convolution with spatial skeleton filtering, where the adjacency is dynamically refined using Gaussian-parameterized skeletons followed by gated message passing.

5. Empirical Performance and Hardware Efficiency

Filter skeleton-based SWP achieves state-of-the-art compression ratios without specialized hardware or custom kernels. For instance, on CIFAR-10 with a VGG16 backbone, SWP prunes 92.66% of parameters and 71.16% of FLOPs with only a 0.40% drop in accuracy (93.25% → 92.85%). On ResNet18/ImageNet, 54.6% of FLOPs are removed with negligible (<0.2%) or even negative (improved) top-1 accuracy change (Meng et al., 2020).

In skeleton-based GCNs, filter skeleton mechanisms result in substantial gains for ambiguous action classes (e.g., "Playing with phone/tablet": +8.2%; "Wear a shoe": +10.0% top-1 accuracy on NTU-60 cross-subject) and consistent overall improvements when plugged into diverse GCN backbones (Ren et al., 9 Sep 2025).

6. Structural Impact and Contextual Significance

By decoupling parameter importance at the sub-filter or sub-graph level, filter skeletons facilitate both adaptive model compression and interpretable topology refinement. FS modules serve as highly parameter-efficient surrogates for explicit mask learning or fixed manual pruning critera. Filters trained under the FS regime exhibit sparser and empirically more stable weight distributions; notably, even when base weights are randomly fixed, learning only the FS can yield high classification accuracy, underscoring the representational power inherent to the skeleton itself (Meng et al., 2020).

In graph domains, incorporating a Gaussian filter skeleton sharpens weak spatial correlations, leading to sparse, selective feature aggregation and improved robustness for challenging classification scenarios.

7. Practical Deployment and Future Directions

FS-based methods, by operating at a hardware-compatible granularity and requiring only minor record-keeping (e.g., indices for retained stripes per layer), enable straightforward integration into existing inference pipelines without the need for custom acceleration libraries. A plausible implication is that further extensions of FS concepts—toward multi-dimensional or content-adaptive masking—could yield new architectures balancing interpretability, sparsity, and universality for both vision and structured data modalities. Moreover, the empirical observation that FS structure alone can drive meaningful predictions suggests a potential research avenue in explicit structural learning and modular network design.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Filter Skeleton (FS).