Papers
Topics
Authors
Recent
Search
2000 character limit reached

Training Neural Networks on Data Sources with Unknown Reliability

Published 6 Dec 2022 in cs.LG and stat.ML | (2212.02895v4)

Abstract: When data is generated by multiple sources, conventional training methods update models assuming equal reliability for each source and do not consider their individual data quality. However, in many applications, sources have varied levels of reliability that can have negative effects on the performance of a neural network. A key issue is that often the quality of the data for individual sources is not known during training. Previous methods for training models in the presence of noisy data do not make use of the additional information that the source label can provide. Focusing on supervised learning, we aim to train neural networks on each data source for a number of steps proportional to the source's estimated reliability by using a dynamic re-weighting strategy motivated by likelihood tempering. This way, we allow training on all sources during the warm-up and reduce learning on less reliable sources during the final training stages, when it has been shown that models overfit to noise. We show through diverse experiments that this can significantly improve model performance when trained on mixtures of reliable and unreliable data sources, and maintain performance when models are trained on reliable sources only.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.