Papers
Topics
Authors
Recent
Search
2000 character limit reached

Improving Generalization via Meta-Learning on Hard Samples

Published 18 Mar 2024 in cs.LG and cs.CV | (2403.12236v2)

Abstract: Learned reweighting (LRW) approaches to supervised learning use an optimization criterion to assign weights for training instances, in order to maximize performance on a representative validation dataset. We pose and formalize the problem of optimized selection of the validation set used in LRW training, to improve classifier generalization. In particular, we show that using hard-to-classify instances in the validation set has both a theoretical connection to, and strong empirical evidence of generalization. We provide an efficient algorithm for training this meta-optimized model, as well as a simple train-twice heuristic for careful comparative study. We demonstrate that LRW with easy validation data performs consistently worse than LRW with hard validation data, establishing the validity of our meta-optimization problem. Our proposed algorithm outperforms a wide range of baselines on a range of datasets and domain shift challenges (Imagenet-1K, CIFAR-100, Clothing-1M, CAMELYON, WILDS, etc.), with ~1% gains using VIT-B on Imagenet. We also show that using naturally hard examples for validation (Imagenet-R / Imagenet-A) in LRW training for Imagenet improves performance on both clean and naturally hard test instances by 1-2%. Secondary analyses show that using hard validation data in an LRW framework improves margins on test data, hinting at the mechanism underlying our empirical gains. We believe this work opens up new research directions for the meta-optimization of meta-learning in a supervised learning context.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. From detection of individual metastases to classification of lymph node status at the patient level: the camelyon17 challenge. IEEE transactions on medical imaging, 38(2):550–560, 2018.
  2. Learning to split for automatic bias detection. arXiv preprint arXiv:2204.13749, 2022.
  3. The iwildcam 2020 competition dataset. arXiv preprint arXiv:2004.10340, 2020.
  4. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341–357, 2013.
  5. Generalized dataweighting via class-level gradient manipulation. Advances in Neural Information Processing Systems, 34:14097–14109, 2021.
  6. Accelerated proximal alternating gradient-descent-ascent for nonconvex minimax machine learning. In 2022 IEEE International Symposium on Information Theory (ISIT), pages 672–677. IEEE, 2022.
  7. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  8. Learning models with uniform performance via distributionally robust optimization. arXiv preprint arXiv:1810.08750, 2018.
  9. Mix and match: an optimistic tree-search approach for learning models from mixture distributions. Advances in Neural Information Processing Systems, 33:11010–11021, 2020.
  10. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR, 2017.
  11. Bilevel programming for hyperparameter optimization and meta-learning, 2018.
  12. The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8349, 2021a.
  13. Natural adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15262–15271, 2021b.
  14. Learning sample reweighting for accuracy and adversarial robustness. arXiv preprint arXiv:2210.11513, 2022.
  15. Meta-learning in neural networks: A survey. arXiv preprint arXiv:2004.05439, 2020.
  16. Instance-conditional timescales of decay for non-stationary learning. arXiv e-prints, pages arXiv–2212, 2022.
  17. Learning model uncertainty as variance-minimizing instance weights. In The Twelfth International Conference on Learning Representations, 2024.
  18. Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pages 5637–5664. PMLR, 2021.
  19. 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, pages 554–561, 2013.
  20. Learning multiple layers of features from tiny images. 2009.
  21. Difficulty-aware meta-learning for rare disease diagnosis. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I 23, pages 357–366. Springer, 2020.
  22. Just train twice: Improving group robustness without training group information. In International Conference on Machine Learning, pages 6781–6792. PMLR, 2021a.
  23. Probabilistic margins for instance reweighting in adversarial training. Advances in Neural Information Processing Systems, 34:23258–23269, 2021b.
  24. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
  25. Prioritized training on points that are learnable, worth learning, and not yet learnt. In International Conference on Machine Learning, pages 15630–15649. PMLR, 2022.
  26. Agnostic federated learning. In International Conference on Machine Learning, pages 4615–4625. PMLR, 2019.
  27. Variance-based regularization with convex objectives. Advances in neural information processing systems, 30, 2017.
  28. The role of over-parametrization in generalization of neural networks. In International Conference on Learning Representations, 2018.
  29. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999, 2018.
  30. Cats and dogs. In 2012 IEEE conference on computer vision and pattern recognition, pages 3498–3505. IEEE, 2012.
  31. Rapid learning or feature reuse? towards understanding the effectiveness of maml. arXiv preprint arXiv:1909.09157, 2019.
  32. Learning to reweight examples for robust deep learning. In International conference on machine learning, pages 4334–4343. PMLR, 2018.
  33. Meta-weight-net: Learning an explicit mapping for sample weighting. Advances in neural information processing systems, 32, 2019.
  34. Asia Pacific Tele-Ophthalmology Society. Aptos 2019 blindness detection dataset, 2019.
  35. Direct importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics, 60:699–746, 2008.
  36. Contrastive multiview coding. In European conference on computer vision, pages 776–794. Springer, 2020.
  37. Part-dependent label noise: Towards instance-dependent label noise. Advances in Neural Information Processing Systems, 33:7597–7610, 2020.
  38. A generalized alternating method for bilevel learning under the polyak-łojasiewicz condition. arXiv e-prints, pages arXiv–2306, 2023.
  39. Learning from massive noisy labeled data for image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2691–2699, 2015.
  40. Doro: Distributional and outlier robust optimization. In International Conference on Machine Learning, pages 12345–12355. PMLR, 2021.
  41. Deep stable learning for out-of-distribution generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5372–5382, 2021.
  42. Learning fast sample re-weighting without reward data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 725–734, 2021.
  43. Model agnostic sample reweighting for out-of-distribution learning. In International Conference on Machine Learning, pages 27203–27221. PMLR, 2022.
  44. Caml: Fast context adaptation via meta-learning. 2018.
Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 2 likes about this paper.