Multilinear Operator Networks (2401.17992v1)

Published 31 Jan 2024 in cs.CV and cs.LG

Abstract: Despite the remarkable capabilities of deep neural networks in image recognition, the dependence on activation functions remains a largely unexplored area and has yet to be eliminated. On the other hand, Polynomial Networks is a class of models that does not require activation functions, but have yet to perform on par with modern architectures. In this work, we aim close this gap and propose MONet, which relies solely on multilinear operators. The core layer of MONet, called Mu-Layer, captures multiplicative interactions of the elements of the input token. MONet captures high-degree interactions of the input elements and we demonstrate the efficacy of our approach on a series of image recognition and scientific computing benchmarks. The proposed model outperforms prior polynomial networks and performs on par with modern architectures. We believe that MONet can inspire further research on models that use entirely multilinear operations.

Citations (5)

View on Semantic Scholar

Summary

The paper introduces MONet, a network that replaces traditional activation functions with Mu-Layers to capture multiplicative interactions, yielding a 10% top-1 accuracy boost on ImageNet.
It employs a novel methodology that stacks Mu-Layers into Poly-Blocks, enabling the capture of complex, up to fourth-degree, interactions among input elements.
Empirical tests demonstrate MONet's robust performance across diverse benchmarks, including standard, fine-grained, and corrupted image datasets.

Introduction

The exploration of alternatives to activation functions within deep neural networks has been an area of active research. Activation functions introduce non-linearity to the networks, which is essential for learning complex patterns. However, these non-linearities come with their own set of challenges, especially when looking into areas such as encryption. Polynomial Networks (PNs) present an interesting direction as they do not demand activation functions and instead rely on polynomial expansions. A novel class of PNs, Multilinear Operator Network (MONet), focuses on utilizing multilinear operations to establish multiplicative interactions within an input token.

MONet: Core Advancements

The authors propose the Mu-Layer, a key building block designed to capture multiplicative interactions through purely multilinear operations. MONet, the resultant architecture, stacks multiple Mu-Layers to form a Poly-Block, which in turn enables the network to model high-degree interactions between input elements. One of the significant claims made in the paper is that for the first time, a multilinear operator-based network achieves a performance on par with modern neural network architectures. This is substantiated with empirical data showing MONet's outperformance over previous polynomial networks on standard image recognition benchmarks.

Experiments and Results

A thorough evaluation of MONet against other architectures is provided. The authors benchmark against conventional MLP models and polynomial-based models, as well as against canonical architectures like ResNet and MLP-Mixer. The experiments span large-scale image classification on ImageNet1K, fine-grained classification on datasets such as CIFAR and SVHN, and medical imaging on the MedMNIST dataset. Results indicate that MONet achieves a top-1 accuracy increase by approximately 10% over prior PNs. Additionally, it demonstrates robustness to visual corruptions, surpassing other models in categories like 'Weather' and 'Digital' on ImageNet-C.

Theoretical Insights and Future Work

The paper explores the theoretical underpinnings of MONet, analyzing the types of interactions it can capture. The authors provide proof that the Mu-Layer captures multiplicative interactions of input elements and the Poly-Block captures up to the fourth-degree interactions. Despite these insights, the theoretical characterization of the polynomial expansions that the MONet can express is an open problem, hinting at possible future research directions.

Conclusion

MONet posits an exciting avenue in the field of deep learning architectures by eschewing activation functions and effectively harnessing multilinear operations. With the release of the source code promised upon paper acceptance, the authors not only show the potential of MONet in terms of performance but also open the doors for further refinement and application in various domains of AI. Moreover, the model's interpretability remains an added advantage, especially in applications like scientific computing where polynomial neural ODE solvers benefit from the transparent model structures.