Incremental Learning Through Deep Adaptation (1705.04228v2)

Published 11 May 2017 in cs.CV and cs.LG

Abstract: Given an existing trained neural network, it is often desirable to learn new capabilities without hindering performance of those already learned. Existing approaches either learn sub-optimal solutions, require joint training, or incur a substantial increment in the number of parameters for each added domain, typically as many as the original network. We propose a method called \emph{Deep Adaptation Networks} (DAN) that constrains newly learned filters to be linear combinations of existing ones. DANs precisely preserve performance on the original domain, require a fraction (typically 13\%, dependent on network architecture) of the number of parameters compared to standard fine-tuning procedures and converge in less cycles of training to a comparable or better level of performance. When coupled with standard network quantization techniques, we further reduce the parameter cost to around 3\% of the original with negligible or no loss in accuracy. The learned architecture can be controlled to switch between various learned representations, enabling a single network to solve a task from multiple different domains. We conduct extensive experiments showing the effectiveness of our method on a range of image classification tasks and explore different aspects of its behavior.

Citations (261)

View on Semantic Scholar

Summary

The paper introduces Deep Adaptation Networks (DAN) to mitigate catastrophic forgetting by recombining existing filters for new tasks.
It proposes controller modules using linear transformations that add only 13% extra parameters, reducible to 3% with quantization.
Experimental results on benchmarks like the Visual Decathlon Challenge show that DAN matches or outperforms traditional fine-tuning without joint training.

Incremental Learning Through Deep Adaptation

The paper "Incremental Learning Through Deep Adaptation" introduces an innovative approach to incremental learning in neural networks, termed Deep Adaptation Networks (DAN). The primary objective is to enable neural networks to acquire new skills without compromising performance on previously learned tasks—a challenge traditionally associated with catastrophic forgetting in neural networks. Existing methodologies often present trade-offs between parameter efficiency and task retention, with some necessitating the storage of previous data or resulting in a substantial increase in model size. In contrast, the DAN framework circumvents these issues by allowing new filters to emerge as linear combinations of existing ones while retaining the original model's performance.

Methodology

The core proposal involves augmenting pre-trained networks with controller modules that adaptively re-combine existing filters. These controller modules make use of linear transformations to create new filters suited to additional tasks. This approach requires only a small fraction (around 13%, network-dependent) of original parameters, compared to standard fine-tuning methods. When complemented with standard network quantization, the parameter overhead can be reduced to approximately 3% without a significant drop in accuracy.

The DAN framework offers a flexible control over task processing by modulating network behavior using a switching variable, enabling a seamless switch between representations learned from different domains. Crucially, this method supports adding arbitrary tasks incrementally without necessitating joint training, addressing a key limitation in scenarios where data from previous tasks is unavailable.

Experimental Validation

The paper presents extensive experimentation on diverse image classification datasets to substantiate the effectiveness of the DAN framework. The reported results demonstrate that DANs can outperform or match traditional fine-tuning techniques in terms of accuracy while significantly reducing the parameter burden. The authors verify the performance on multiple benchmarks, including their application to the Visual Decathlon Challenge, showcasing the versatility and robustness of the DAN approach across different datasets and network architectures. Particularly, using the DAN method, they achieved competitive scores without the necessity of joint training or complex parameter-dependent configurations, compared to the Residual Adapters technique.

Implications and Future Prospects

This research extends the framework of transfer learning by providing a refined mechanism that conserves computational resources and maintains model accuracy across several domains. Combined with network quantization techniques, the approach presents a promising direction for deploying adaptive neural networks in resource-constrained environments where storage and computation are limited. The methodology's adaptability to integrate datasets one-by-one is particularly advantageous for real-world applications where data might be accessible incrementally rather than in a batch.

Future developments in AI could explore more generalized methods for estimating linear combinations of filters, potentially by leveraging advanced heuristics or meta-learning processes to automatically adjust the DAN's structures for optimal task transitions. Additionally, exploring other domains such as natural language processing or sensor-based applications could reveal the broader applicability of DANs beyond image classification, further extending the paradigm of incremental learning in AI systems.

Overall, the research outlined in this paper carves out a path for optimizing memory and computation in large-scale neural networks while preserving the accuracy and adaptability required for incremental learning paradigms.