Exploiting Unlabeled Data in CNNs by Self-supervised Learning to Rank (1902.06285v1)

Published 17 Feb 2019 in cs.CV

Abstract: For many applications the collection of labeled data is expensive laborious. Exploitation of unlabeled data during training is thus a long pursued objective of machine learning. Self-supervised learning addresses this by positing an auxiliary task (different, but related to the supervised task) for which data is abundantly available. In this paper, we show how ranking can be used as a proxy task for some regression problems. As another contribution, we propose an efficient backpropagation technique for Siamese networks which prevents the redundant computation introduced by the multi-branch network architecture. We apply our framework to two regression problems: Image Quality Assessment (IQA) and Crowd Counting. For both we show how to automatically generate ranked image sets from unlabeled data. Our results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results for both IQA and crowd counting. In addition, we show that measuring network uncertainty on the self-supervised proxy task is a good measure of informativeness of unlabeled data. This can be used to drive an algorithm for active learning and we show that this reduces labeling effort by up to 50%.

Citations (167)

View on Semantic Scholar

Summary

The paper demonstrates that integrating a ranking proxy task enables CNNs to leverage unlabeled data for enhanced regression performance in imaging tasks.
It introduces a novel backpropagation technique for Siamese networks that cuts computational overhead in multi-branch architectures.
The method employs an active learning strategy based on uncertainty to reduce labeling costs by up to 50% while improving model accuracy.

Leveraging Unlabeled Data through Self-supervised Learning in Computer Vision

Recent advances in machine learning have underscored the necessity of large volumes of labeled data to train reliable models, particularly convolutional neural networks (CNNs). However, the arduous nature and high cost of data labeling in certain domains—such as image quality assessment (IQA) and crowd counting—renders this approach less feasible. Informed by such challenges, the paper "Exploiting Unlabeled Data in CNNs by Self-supervised Learning to Rank" explores an alternative strategy: leveraging unlabeled data through self-supervised learning to improve the efficacy of regression tasks.

Overview and Contributions

This research proposes integrating ranking as a proxy task within self-supervised learning frameworks for CNNs. By defining this auxiliary task, the paper explores how ranking can serve to harness unlabeled data effectively. The paper makes several contributions:

Self-supervised Learning through Ranking: It demonstrates how ranking tasks can operate as self-supervised proxy tasks, enabling networks to utilize unlabeled data to enhance models where labeled datasets are insufficient.
Efficient Backpropagation Technique: The authors introduce a novel backpropagation method tailored for Siamese networks, specifically to reduce redundant computations that typically accompany multi-branch architectures.
Active Learning Application: Leveraging the uncertainty in the proxy task, the paper formulates an active learning strategy to pinpoint images that would most benefit the model if labeled, reducing labeling costs by up to 50%.

The methodologies presented are applied to two computer vision regression problems: no-reference image quality assessment and crowd counting.

Image Quality Assessment

For IQA, where conventional methods rely on small datasets requiring extensive human annotations, the paper presents a method to automatically generate ranking data by distorting images with various intensities and types. The proposed multidimensional learning framework combines unsupervised ranking with regression using labeled datasets. This approach showed marked improvements in correlation coefficients compared to state-of-the-art methods, substantiating the value of using ranking as a proxy task.

Crowd Counting

A key challenge in crowd counting is the diverse and complex nature of visual scenes, demanding sophisticated models for accurate estimation. Utilizing self-supervised learning, the authors propose generating ranked subsets from unlabeled data by examining patch inclusion, to establish relative counts of people. Experiments demonstrated that training CNNs with these ranked subsets enhanced model performance, closely aligning with state-of-the-art methods.

Implications and Future Directions

The paper’s implications stretch beyond the immediate applications explored. The concept of self-supervised learning through auxiliary tasks like ranking is broadly applicable in machine learning domains beset by paucity of labeled data. By decoupling certain learning components and engaging unlabeled data in meaningful ways, researchers can potentially develop more robust models with reduced labeled data dependency.

Future research could benefit from expanding this framework to other regression-based domains, diving deeper into active learning and uncertainty factors to optimize dataset acquisition. Such advances bear promise for more practical, scalable AI systems.

Conclusively, while this paper highlights effective strategies for exploiting unlabeled data in computer vision, it beckons further exploration into how similar techniques can be extrapolated across varied machine learning tasks. The insights gleaned underscore a significant evolution toward data-efficient learning paradigms, paving the way for more sustainable and innovative AI solutions.