Volterra Neural Networks (VNNs)

Published 21 Oct 2019 in cs.CV, cs.LG, and eess.IV | (1910.09616v5)

Abstract: The importance of inference in Machine Learning (ML) has led to an explosive number of different proposals in ML, and particularly in Deep Learning. In an attempt to reduce the complexity of Convolutional Neural Networks, we propose a Volterra filter-inspired Network architecture. This architecture introduces controlled non-linearities in the form of interactions between the delayed input samples of data. We propose a cascaded implementation of Volterra Filtering so as to significantly reduce the number of parameters required to carry out the same classification task as that of a conventional Neural Network. We demonstrate an efficient parallel implementation of this Volterra Neural Network (VNN), along with its remarkable performance while retaining a relatively simpler and potentially more tractable structure. Furthermore, we show a rather sophisticated adaptation of this network to nonlinearly fuse the RGB (spatial) information and the Optical Flow (temporal) information of a video sequence for action recognition. The proposed approach is evaluated on UCF-101 and HMDB-51 datasets for action recognition, and is shown to outperform state of the art CNN approaches.

Abstract PDF HTML Upgrade to Chat

Authors (2)

References (48)

Citations (9)

View on Semantic Scholar

Summary

The paper demonstrates how VNNs leverage Volterra series to introduce controlled non-linearities, achieving enhanced performance in action recognition and image generation.
It details a nested and overlapping filter architecture that optimizes parameter efficiency and reduces computational complexity compared to CNNs and LSTMs.
The study validates VNNs' robustness to noise and frame rate variations while showcasing versatility in applications including GAN-based image generation.

Volterra Neural Networks (VNNs): Technical Overview

Volterra Neural Networks (VNNs) introduce a novel approach to neural network architectures inspired by Volterra series and filters, aiming to tackle the complexity and computational challenges inherent in Convolutional Neural Networks (CNNs). This essay provides an expert analysis of the VNN framework, exploring its design, implementation, and performance in action recognition and image generation tasks.

Volterra Filter-based Architecture

Theoretical Basis: Volterra Series

The Volterra series serves as a foundational mathematical tool for modeling non-linear systems, representing systems using a series of integrals. In the context of VNNs, the series introduces controlled non-linearities through interactions between delayed data samples. The architecture leverages these principles to implement a Volterra-filter-inspired network, with the filters structured in cascades to explore higher-order terms without excessive parameterization.

Figure 1: Adaptive Volterra Filter.

Nested and Overlapping Volterra Filters

The nested Volterra filter architecture allows a structured layered approach reminiscent of CNNs but significantly optimizes parameter efficiency. The overlapping Volterra Neural Network (O-VNN) implementation provides a detailed block diagram that elucidates how cascaded filters manage high-order interactions.

Figure 3: Block diagram for an Overlapping Volterra Neural Network.

Comparison with Conventional Methods

Relation to CNNs and LSTMs

While CNNs introduce non-linearities via activation functions, VNNs achieve this through higher-order Volterra terms, resulting in potentially more fine-grained learning. The comparison extends to the capabilities of VNNs versus Long-Short Term Memory networks (LSTMs), where VNNs inherently model temporal dynamics without the explicit use of activations, hence learning approximations of activation functions inherently.

Computational Efficiency

A key advantage of the VNN architecture is the substantial reduction in computational complexity compared to traditional CNN models. This efficiency is critical when considering deployment on resource-constrained devices, where VNNs maintain remarkable performance with fewer parameters, as shown in various datasets like UCF-101 and HMDB-51.

Practical Implications and Performance

Action Recognition

The VNN model demonstrated competitive performance across standard action recognition datasets. Notably, the implementation outperformed state-of-the-art models when trained from scratch, emphasizing its potential for tasks with limited training data. The non-linear fusion of spatial and temporal information streams further supports improved accuracy, revealing intrinsic relationships between different data modalities.

Figure 5: (a) Input Video, (b) Features extracted by only RGB stream, (c) Features extracted by Two-Stream Volterra Filtering.

Robustness to Noise and Frame Rate Variations

VNNs exhibit enhanced robustness to Gaussian noise and variations in frame rates, highlighting their capability to generalize better under sub-optimal conditions, where traditional CNNs might falter.

Figure 2: Performance comparison between VNN and CNN implementations when noise is added to the input videos.

Image Generation with GANs

The VNN framework's adaptability extends to generative models like GANs, where it contributes to generating high-quality images using datasets such as CIFAR-10. This application further underscores the versatility of Volterra-based architectures in addressing diverse AI challenges.

Figure 8: Generated images using Cifar10 Dataset.

Conclusion

Volterra Neural Networks offer a promising alternative to conventional deep learning architectures, particularly excelling in tasks demanding efficient computation and robust modeling capabilities. By harnessing the nuanced capabilities of Volterra series, VNNs deliver enhanced performance in video action recognition and generative tasks, paving the way for future developments in AI systems that combine reduced complexity with high accuracy. Further exploration and optimization of VNNs could lead to broader adoption in varying domains where traditional models remain impractical.

Markdown Report Issue