Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis (1909.01520v3)

Published 4 Sep 2019 in cs.LG, cs.CV, and stat.ML

Abstract: When an agent acquires new information, ideally it would immediately be capable of using that information to understand its environment. This is not possible using conventional deep neural networks, which suffer from catastrophic forgetting when they are incrementally updated, with new knowledge overwriting established representations. A variety of approaches have been developed that attempt to mitigate catastrophic forgetting in the incremental batch learning scenario, where a model learns from a series of large collections of labeled samples. However, in this setting, inference is only possible after a batch has been accumulated, which prohibits many applications. An alternative paradigm is online learning in a single pass through the training dataset on a resource constrained budget, which is known as streaming learning. Streaming learning has been much less studied in the deep learning community. In streaming learning, an agent learns instances one-by-one and can be tested at any time, rather than only after learning a large batch. Here, we revisit streaming linear discriminant analysis, which has been widely used in the data mining research community. By combining streaming linear discriminant analysis with deep learning, we are able to outperform both incremental batch learning and streaming learning algorithms on both ImageNet ILSVRC-2012 and CORe50, a dataset that involves learning to classify from temporally ordered samples.

Citations (137)

View on Semantic Scholar

Summary

The paper introduces deep SLDA integrated with CNNs to enable single-pass streaming learning, significantly reducing catastrophic forgetting.
The paper demonstrates superior accuracy on datasets like ImageNet and CORe50 with training speeds up to 100 times faster and lower memory usage.
The paper highlights deep SLDA’s adaptability in real-time, resource-constrained, and non-iid environments, paving the way for embedded applications.

Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis

The paper presents a novel approach to lifelong machine learning through the integration of Deep Streaming Linear Discriminant Analysis (SLDA) with deep neural networks, addressing the problem of catastrophic forgetting encountered by conventional deep neural networks in incremental learning scenarios. This integration enables the application of streaming learning, which presents distinct advantages over the commonly used incremental batch learning approaches.

Conventional deep neural networks struggle with catastrophic interference when learning tasks sequentially. As new information is introduced to these models, prevailing knowledge tends to be overwritten, significantly degrading performance on previously acquired tasks. Traditional methods to manage this limitation usually involve incremental batch learning, which requires large batch accumulations before learning can occur. While effective in some contexts, incremental batch methods are not conducive to real-time, or near-real-time, learning environments due to their reliance on storage and rehearsal of previous data batches.

Deep SLDA introduces a promising alternative as it embodies streaming learning, wherein instances are processed individually and continuously in resource-constrained environments. In streaming learning, unlike batch-based learning, an agent doesn't revisit previous examples and learns from data as it arrives—a single pass learning mechanism. This is akin to how humans acquire new skills and knowledge seamlessly without revisiting prior lessons with every new concept learned.

The key innovation of this research is the implementation of streaming linear discriminant analysis within the architecture of convolutional neural networks specifically at the output layer, maintaining lightweight memory requirements and high computational efficiency. The experiments conducted on large-scale image classification datasets like ImageNet ILSVRC-2012 and CORe50 demonstrate that deep SLDA not only competes favorably with but also surpasses existing state-of-the-art methods which utilize incremental batch learning paradigms.

Several noteworthy findings are disclosed:

Numerical Superiority: The Deep SLDA method achieves superior classification accuracy, exemplifying substantial improvements, such as achieving a top-5 accuracy with significantly reduced training time and memory usage compared to existing models like iCaRL and End-to-End Incremental learning on ImageNet.
Optimization Efficiency: The approach of updating the output layer and utilizing a covariance matrix as a statistical representation overcomes the need for complex data rehearsal mechanisms. This leads to a model that can be over 100 times faster in training while consuming minimal memory resources—a factor critical for deployment on embedded systems and devices with limited computational capabilities.
Flexibility and Robust Performance: Deep SLDA adapts exceptionally well to different data orderings, as evidenced by its performance across various temporal data streams within the CORe50 dataset, indicating robustness to non-iid data—a commonality in real-world applications.
Comparison with Regularization Methods: Even considering models that require additional information like task labels for reducing forgetting, deep SLDA maintains competitive edge without these requirements, suggesting a versatile applicability absent in constraint-heavy methods.

The paper proposes that the simplicity, memory efficiency, and processing speed of this approach provide practical advantages for embedded systems where rapid learning and resource conservation are paramount. Moreover, the research opens avenues for expanding the utility of deep SLDA within larger networks and exploring its integration with generative rehearsal techniques for improved long-term retention without compromising newly acquired knowledge.

The implications of this work extend beyond theoretical interests, hinting at practical applications in robotics, on-device learning for mobile gadgets, and AI systems facing varying real-time environments. The work prompts a reconsideration of longstanding learning paradigms in neural networks and inspires future exploration concerning minimalist model architectures that can achieve high performance in lifelong learning tasks. This approach exemplifies a step toward more autonomous, adaptable AI systems that better mimic human-like learning without inappropriate resource expenditure.

PDF Markdown

Related Papers

YouTube

Show All Videos