Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 71 tok/s

Gemini 2.5 Pro 54 tok/s Pro

GPT-5 Medium 22 tok/s Pro

GPT-5 High 29 tok/s Pro

GPT-4o 88 tok/s Pro

Kimi K2 138 tok/s Pro

GPT OSS 120B 446 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

Multi-Fiber Networks for Video Recognition (1807.11195v3)

Published 30 Jul 2018 in cs.CV

Abstract: In this paper, we aim to reduce the computational cost of spatio-temporal deep neural networks, making them run as fast as their 2D counterparts while preserving state-of-the-art accuracy on video recognition benchmarks. To this end, we present the novel Multi-Fiber architecture that slices a complex neural network into an ensemble of lightweight networks or fibers that run through the network. To facilitate information flow between fibers we further incorporate multiplexer modules and end up with an architecture that reduces the computational cost of 3D networks by an order of magnitude, while increasing recognition performance at the same time. Extensive experimental results show that our multi-fiber architecture significantly boosts the efficiency of existing convolution networks for both image and video recognition tasks, achieving state-of-the-art performance on UCF-101, HMDB-51 and Kinetics datasets. Our proposed model requires over 9x and 13x less computations than the I3D and R(2+1)D models, respectively, yet providing higher accuracy.

Citations (215)

View on Semantic Scholar

Summary

The paper presents Multi-Fiber Networks that decompose complex 3D CNNs into lightweight, sparsely connected fibers to reduce computational burden.
The integration of multiplexer modules enables efficient cross-fiber feature exchange, enhancing overall model capacity without extra overhead.
The approach delivers state-of-the-art results on video benchmarks like Kinetics, UCF-101, and HMDB-51, proving its practical efficiency for spatio-temporal recognition.

Overview of Multi-Fiber Networks for Video Recognition

The paper "Multi-Fiber Networks for Video Recognition" addresses the challenge of high computational costs inherent in spatio-temporal deep neural networks, particularly those employing 3D convolutions for video analysis. The authors propose a novel architecture named Multi-Fiber Networks, which optimizes computational efficiency while maintaining competitive accuracy levels on prominent video recognition benchmarks such as UCF-101, HMDB-51, and Kinetics.

The Multi-Fiber architecture introduces a design where a complex neural network is decomposed into an ensemble of lightweight networks termed "fibers." These fibers operate across the layers of the network in a sparsely connected fashion, significantly reducing the computational load. A key innovation within this framework is the inclusion of multiplexer modules which facilitate information flow between fibers, thereby augmenting the model's overall capacity without additional computational overhead. Experimental results underscore the effectiveness of the proposed model configuration, evidencing substantial reductions in computational costs by a factor of nine or more compared to leading models such as I3D and R(2+1)D, alongside enhancements in recognition performance.

Key Contributions

Efficiency Enhancement: The Multi-Fiber architecture significantly reduces the computational burden in 3D CNNs by implementing sparse connectivity within the network layers. This approach diverts from traditional methods of reducing 3D convolution complexity, such as filter decomposition, by addressing the inefficiencies stemming from large input tensors.
Integration of Multiplexer Modules: The multiplexer modules are designed to enable cross-fiber communication, allowing for the exchange and integration of features across the sparsely connected fibers. This mechanism compensates for potential information loss due to sparse connections and enhances the overall learning capacity of the model.
Spatio-Temporal Video Recognition: The proposed architecture has been adapted to support spatio-temporal inputs, and it derives its effectiveness from efficiently capturing motion features and temporal dependencies. The Multi-Fiber Networks achieve state-of-the-art results on the Kinetics, UCF-101, and HMDB51 datasets, validating their efficacy relative to computational cost.

Implications and Future Directions

The paper presents strong empirical evidence that Multi-Fiber Networks offer a practical solution for deploying resource-efficient 3D CNNs in real-world video recognition applications. The demonstrated reduction in computational costs positions this model as a viable candidate for deployment in environments where computational resources are constrained.

From a theoretical standpoint, this work opens avenues for the exploration of sparse network topologies beyond video recognition, potentially extending to other domains with high dimensional inputs, such as volumetric data processing in 3D medical imaging or high-dimensional genomics data.

Future work could explore the optimization of multiplexer modules—either by enhancing their design or integrating alternative methods for fiber interaction—to further improve performance. Moreover, exploring the parallelization efficiency and inference speed of Multi-Fiber Networks in various hardware contexts could provide valuable insights for widespread adoption.

While the current research focuses on video recognition, the principles underpinning the Multi-Fiber Networks have broader implications for network architecture design, especially in the emerging field of efficient neural networks. By innovating on the structural frameworks of these networks, the paper contributes to ongoing efforts to balance computational efficiency and performance in deep learning systems.