Learning feed-forward one-shot learners (1606.05233v1)

Published 16 Jun 2016 in cs.CV and cs.LG

Abstract: One-shot learning is usually tackled by using generative models or discriminative embeddings. Discriminative methods based on deep learning, which are very effective in other learning scenarios, are ill-suited for one-shot learning as they need large amounts of training data. In this paper, we propose a method to learn the parameters of a deep model in one shot. We construct the learner as a second deep network, called a learnet, which predicts the parameters of a pupil network from a single exemplar. In this manner we obtain an efficient feed-forward one-shot learner, trained end-to-end by minimizing a one-shot classification objective in a learning to learn formulation. In order to make the construction feasible, we propose a number of factorizations of the parameters of the pupil network. We demonstrate encouraging results by learning characters from single exemplars in Omniglot, and by tracking visual objects from a single initial exemplar in the Visual Object Tracking benchmark.

Citations (459)

View on Semantic Scholar

Summary

The paper introduces a learnet that dynamically predicts the pupil network's parameters from a single example, fundamentally enhancing one-shot learning.
It employs innovative parameter factorizations for fully connected and convolutional layers to reduce model complexity and mitigate overfitting.
Experimental results on Omniglot and VOT benchmarks demonstrate notable accuracy improvements and real-time performance over traditional siamese models.

Analysis of "Learning Feed-Forward One-Shot Learners"

This paper presents an innovative approach for one-shot learning using deep neural networks, introducing a model referred to as a "learnet" that predicts the parameters of another neural network model (referred to as a "pupil") from a single example. One-shot learning, which necessitates the learning of a concept from a minimum number of examples, typically challenges deep learning models due to their data-dependence for training. The learnet paradigm suggests that the pupil's parameters can be dynamically generated by the learnet, enabling efficient one-shot learning with a feed-forward process.

Methodological Innovation

The core of this paper's contribution is the introduction of the learnet, a neural network that learns to infer the pupil network's parameters from only one exemplar. The authors contrast their approach with typical methods, such as generative models and discriminative architectures involving embeddings like siamese networks. They propose learning a feed-forward model capable of instantaneously predicting the parameters of a deep discriminative model.

To reduce the complexity associated with naively predicting the full set of neural network parameters, the authors propose novel parameter factorizations. For fully connected layers, these factorizations involve a decomposition reminiscent of a Singular Value Decomposition (SVD). In the case of convolutional layers, the factorization includes pixel-wise projections and predicted filters acting as a basis set. This step crucially reduces the high-dimensionality problem by only predicting diagonal elements, minimizing overfitting and computational resources.

Experimental Results

The empirical evaluation addresses two distinct applications: character recognition from single examples using the Omniglot dataset and visual object tracking. In character recognition, the single-stream learnet architecture demonstrated significant improvement, with an error rate of 28.6% compared to 37.3% for a traditional siamese network with shared weights. The experiments confirm that the reduction in complexity afforded by the dynamic parameter prediction allows for effective one-shot learning. Another compelling application is object tracking, where the learnet is trained using video data from the ImageNet challenge and evaluated on the VOT 2015 benchmark. Here, the learned feed-forward approach showed competitive performance, achieving efficient real-time tracking at speeds exceeding 60 FPS.

Implications and Future Directions

The implications of the learnet approach are multifaceted. Practically, it provides a promising avenue for real-time applications requiring immediate adaptation such as video tracking, personalizing AI experiences, or medical imaging, where data comes with high rates but sparse or costly labels. Theoretically, this work suggests new directions for meta-learning and learning-to-learn paradigms, highlighting the potential to generalize across tasks by leveraging learned prior knowledge in highly parameterized models. Future research might further explore the ability to share learnets across various domains or integrate domain adaptation, potentially treating one-shot learning in heterogeneous environments.

Conclusion

In summary, the authors make a substantial contribution to the field of one-shot learning by demonstrating that dynamic parameter prediction using learnets can improve the efficiency and efficacy of learning in constrained data regimes. The compelling experimental results see foundational impacts in both discriminative and practical applications, providing fresh insights into the capabilities and design of neural networks tailored for one-shot learning scenarios.

PDF Markdown