Using Topological Framework for the Design of Activation Function and Model Pruning in Deep Neural Networks (2109.01572v1)

Published 3 Sep 2021 in cs.CV, cs.CG, and cs.LG

Abstract: Success of deep neural networks in diverse tasks across domains of computer vision, speech recognition and natural language processing, has necessitated understanding the dynamics of training process and also working of trained models. Two independent contributions of this paper are 1) Novel activation function for faster training convergence 2) Systematic pruning of filters of models trained irrespective of activation function. We analyze the topological transformation of the space of training samples as it gets transformed by each successive layer during training, by changing the activation function. The impact of changing activation function on the convergence during training is reported for the task of binary classification. A novel activation function aimed at faster convergence for classification tasks is proposed. Here, Betti numbers are used to quantify topological complexity of data. Results of experiments on popular synthetic binary classification datasets with large Betti numbers(>150) using MLPs are reported. Results show that the proposed activation function results in faster convergence requiring fewer epochs by a factor of 1.5 to 2, since Betti numbers reduce faster across layers with the proposed activation function. The proposed methodology was verified on benchmark image datasets: fashion MNIST, CIFAR-10 and cat-vs-dog images, using CNNs. Based on empirical results, we propose a novel method for pruning a trained model. The trained model was pruned by eliminating filters that transform data to a topological space with large Betti numbers. All filters with Betti numbers greater than 300 were removed from each layer without significant reduction in accuracy. This resulted in faster prediction time and reduced memory size of the model.

Summary

The paper introduces a novel activation function using Betti numbers to simplify data topology, cutting training epochs by 1.5 to 2 times.
The paper presents a systematic pruning technique that removes filters with Betti numbers over 300, reducing model complexity with minimal accuracy loss.
The study demonstrates that integrating topological insights into DNN design enhances convergence speed and computational efficiency for diverse tasks.

Designing Activation Functions and Model Pruning Techniques Using Topological Analysis

Introduction

Deep Neural Networks (DNNs) have played a pivotal role in advancing the state-of-the-art across various domains like computer vision, speech recognition, and natural language processing. A crucial aspect of DNN architecture that influences its performance is the selection of activation functions and the network's structure, including the process of model pruning. This paper explores the application of topological concepts to develop a novel activation function aimed at accelerating training convergence and proposes a systematic approach for model pruning based on the topology of data transformations across network layers.

Topological Framework for Activation Function Design

Novel Activation Function

The paper introduces a new activation function, premised on topological transformation principles. The goal was to design an activation function that could achieve faster convergence for classification tasks by simplifying the topological complexity of training data as it progresses through the network layers. To this end, Betti numbers—topological invariants that quantify the complexity of topological spaces—were employed to gauge the effectiveness of the proposed activation function compared to traditional ones like ReLU and Sigmoid.

Empirical evaluations on binary classification tasks with Multi-Layer Perceptrons (MLPs) revealed that the novel activation function could reduce Betti numbers more rapidly across layers. This indicated a quicker topological simplification of the training data, leading to a reduction in the number of required epochs by a factor of 1.5 to 2.

Implementation Insights

The activation function was crafted by integrating discontinuities and multiple many-to-one mappings. This configuration was hypothesized to facilitate the gathering of samples from the same class, thereby potentially decreasing the topological complexity of the data associated with each class. The experimental outcomes on popular datasets such as fashion MNIST, CIFAR-10, and cat-vs-dog images validated the hypothesis, showcasing faster convergence without compromising accuracy.

Systematic Approach to Model Pruning

Pruning Strategy

In parallel to optimizing the activation function, the paper proposes a novel methodology for model pruning. This process aims to streamline the trained model by eliminating filters that contribute to higher topological complexity, as denoted by Betti numbers. The technique asserts that filters resulting in large Betti numbers are less significant for the model's predictive accuracy and can, therefore, be pruned to enhance computational efficiency and reduce memory footprint.

The empirical evaluation for the pruning strategy was conducted on Convolutional Neural Networks (CNNs) trained on benchmark image datasets. The results were compelling, showing that filters characterized by Betti numbers greater than 300 could be removed with minimal impact on accuracy. This pruning not only led to quicker prediction times but also to significantly reduced model sizes.

Implications and Future Directions

The application of topological analysis for the design of activation functions and model pruning presents a novel perspective in the optimization of neural network architectures. This approach underscores the potential of leveraging mathematical properties of data transformations across network layers to inform the architectural decisions in DNN design.

Looking ahead, the integration of topological concepts into more aspects of neural network architecture and training could further enhance the efficiency and efficacy of DNNs. Future research could explore the extension of these techniques beyond binary classification tasks and investigate their applicability across a broader spectrum of machine learning challenges. The promising results of this paper lay the groundwork for further exploration and validation across diverse datasets and problem domains, potentially leading to more generalized and topologically informed guidelines for neural network design and optimization.

Conclusion

This paper brings to the forefront the underexplored potential of topological analysis in the context of deep learning. By bridging the gap between mathematical topology and neural network architecture, it successfully demonstrates how topological measures can inform the design of more efficient activation functions and systematic model pruning techniques. The findings pave the way for novel approaches to neural network optimization, potentially leading to more effective and computationally efficient models for a wide array of tasks in machine learning.