Papers
Topics
Authors
Recent
Search
2000 character limit reached

Transductive Active Learning: Theory and Applications

Published 13 Feb 2024 in cs.LG and cs.AI | (2402.15898v6)

Abstract: We study a generalization of classical active learning to real-world settings with concrete prediction targets where sampling is restricted to an accessible region of the domain, while prediction targets may lie outside this region. We analyze a family of decision rules that sample adaptively to minimize uncertainty about prediction targets. We are the first to show, under general regularity assumptions, that such decision rules converge uniformly to the smallest possible uncertainty obtainable from the accessible data. We demonstrate their strong sample efficiency in two key applications: active fine-tuning of large neural networks and safe Bayesian optimization, where they achieve state-of-the-art performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (123)
  1. Abbasi-Yadkori, Y. Online learning for linearly parametrized control problems. PhD thesis, University of Alberta, 2013.
  2. Adapting the linearised laplace model evidence for modern deep learning. In International Conference on Machine Learning, pp.  796–821. PMLR, 2022.
  3. On exact computation with an infinitely wide neural net. NeurIPS, 32, 2019.
  4. k-means++: The advantages of careful seeding. In SODA, volume 7, 2007.
  5. Gone fishing: Neural active learning with fisher embeddings. NeurIPS, 34, 2021.
  6. Deep batch active learning by diverse, uncertain gradient lower bounds. ICLR, 2020.
  7. Best arm identification in multi-armed bandits. In COLT, 2010.
  8. Ann-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems, 87, 2020.
  9. A cookbook of self-supervised learning. arXiv preprint arXiv:2304.12210, 2023.
  10. Barrett, A. B. Exploration of synergistic and redundant information sharing in static and dynamical gaussian systems. Physical Review E, 91(5), 2015.
  11. Gosafe: Globally optimal safe robot learning. In ICRA, 2021.
  12. Curriculum learning. In ICML, volume 26, 2009.
  13. Feature selection via mutual information: New theoretical insights. In IJCNN, 2019.
  14. Safe controller optimization for quadrotors with gaussian processes. In ICRA, 2016.
  15. Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics. Machine Learning, 2021.
  16. Weight uncertainty in neural network. In ICML, 2015.
  17. Truncated variance reduction: A unified approach to bayesian optimization and level-set estimation. NeurIPS, 29, 2016.
  18. Information-theoretic safe exploration with gaussian processes. NeurIPS, 35, 2022.
  19. Information-theoretic safe bayesian optimization. arXiv preprint arXiv:2402.15347, 2024.
  20. Pure exploration in multi-armed bandits problems. In ALT, volume 20, 2009.
  21. Bayesian experimental design: A review. Statistical Science, 1995.
  22. Chandra, B. Quadrotor simulation, 2023. URL https://github.com/Bharath2/Quadrotor-Simulation.
  23. Near-optimal batch mode active learning and adaptive submodular optimization. In ICML, 2013.
  24. On kernelized multi-armed bandits. In ICML, 2017.
  25. Cohn, D. Neural network exploration using optimal experiment design. Advances in neural information processing systems, 6, 1993.
  26. Similarity search for efficient active learning and search of rare concepts. In AAAI, volume 36, 2022.
  27. Multidimensional bayesian estimation for deep brain stimulation using the safeopt algorithm. medRxiv, 2022.
  28. Cover, T. M. Elements of information theory. John Wiley & Sons, 1999.
  29. Algorithms for subset selection in linear regression. In STOC, volume 40, 2008.
  30. Approximate submodularity and its applications: Subset selection, sparse approximation and dictionary selection. JMLR, 19(1), 2018.
  31. Laplace redux-effortless bayesian deep learning. NeurIPS, 34, 2021.
  32. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
  33. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL, 2019.
  34. Likelihood ratio confidence sets for sequential decision making. NeurIPS, 37, 2023.
  35. Adaptivity in adaptive submodularity. In COLT, 2021.
  36. Sequential experimental design for transductive linear bandits. NeurIPS, 32, 2019.
  37. Consistency-based semi-supervised active learning: Towards minimizing labeling cost. In ECCV, 2020.
  38. Bayesian optimization with inequality constraints. In ICML, volume 2014, 2014.
  39. Deep active learning over the long tail. arXiv preprint arXiv:1711.00941, 2017.
  40. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. JAIR, 42, 2011.
  41. Semi-supervised learning by entropy minimization. NeurIPS, 17, 2004.
  42. Automated curriculum learning for neural networks. In ICML, 2017.
  43. Graybill, F. A. An introduction to linear statistical models. Literary Licensing, LLC, 1961.
  44. Accelerating large-scale inference with anisotropic vector quantization. In ICML, 2020.
  45. Optimistic active-learning using mutual information. In IJCAI, volume 7, 2007.
  46. Bayesian deep ensembles via the neural tangent kernel. NeurIPS, 33, 2020.
  47. Benchmarking neural network robustness to common corruptions and perturbations. ICLR, 2019.
  48. A baseline for detecting misclassified and out-of-distribution examples in neural networks. ICLR, 2017.
  49. Entropy search for information-efficient global optimization. JMLR, 13(6), 2012.
  50. Predictive entropy search for efficient global optimization of black-box functions. NeurIPS, 27, 2014.
  51. Output-space predictive entropy search for flexible global optimization. In NeurIPS workshop on Bayesian Optimization, 2015.
  52. A framework and benchmark for deep batch active learning for regression. JMLR, 24(164), 2023.
  53. Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8), 1982.
  54. Bayesian active learning for classification and preference learning. CoRR, 2011.
  55. Universal language model fine-tuning for text classification. In ACL, 2018.
  56. Active few-shot fine-tuning. arXiv preprint arXiv:2402.15441, 2024.
  57. Neural tangent kernel: Convergence and generalization in neural networks. NeurIPS, 31, 2018.
  58. Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7(3), 2019.
  59. Probabilistic active meta-learning. NeurIPS, 33, 2020.
  60. Neural contextual bandits without regret. In AISTATS, 2022.
  61. Approximate inference turns deep networks into gaussian processes. NeurIPS, 32, 2019.
  62. Scalable greedy feature selection via weak submodularity. In AISTATS, 2017.
  63. Adam: A method for stochastic optimization. In ICLR, 2014.
  64. Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning. NeurIPS, 32, 2019.
  65. Adaptive and safe bayesian optimization in high dimensions via one-dimensional subspaces. In ICML, 2019.
  66. Wilds: A benchmark of in-the-wild distribution shifts. In ICML, 2021.
  67. Do better imagenet models transfer better? In CVPR, 2019.
  68. Submodular function maximization. Tractability, 3, 2014.
  69. Nonmyopic active learning of gaussian processes: an exploration-exploitation approach. In ICML, volume 24, 2007.
  70. Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies. JMLR, 9(2), 2008.
  71. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
  72. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998.
  73. Deep neural networks as gaussian processes. ICLR, 2018.
  74. Wide neural networks of any depth evolve as linear models under gradient descent. NeurIPS, 32, 2019.
  75. Surgical fine-tuning improves adaptation to distribution shifts. NeurIPS workshop on Distribution Shifts, 2022.
  76. A sequential algorithmfor training text classifiers. In SIGIR, 1994.
  77. Heterogeneous uncertainty sampling for supervised learning. In Machine learning proceedings 1994. Elsevier, 1994.
  78. Sampling from gaussian process posteriors using stochastic gradient descent. NeurIPS, 37, 2023.
  79. A simple baseline for bayesian uncertainty in deep learning. NeurIPS, 32, 2019.
  80. A kernel-based view of language model fine-tuning. In ICML, 2023.
  81. Optimizing neural networks with kronecker-factored approximate curvature. In ICML, 2015.
  82. Murphy, K. P. Probabilistic machine learning: Advanced topics. MIT Press, 2023.
  83. Experimental design for linear functionals in reproducing kernel hilbert spaces. NeurIPS, 35, 2022.
  84. An analysis of approximations for maximizing submodular set functions—i. Mathematical programming, 14, 1978.
  85. The effectiveness of lloyd-type methods for the k-means problem. JACM, 2013.
  86. Experiment planning with function approximation. NeurIPS, 37, 2024.
  87. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 27(8), 2005.
  88. Random features for large-scale kernel machines. NeurIPS, 20, 2007.
  89. Do imagenet classifiers generalize to imagenet? In ICML, 2019.
  90. Meta-learning priors for safe bayesian optimization. In COLT, 2023.
  91. A tutorial on thompson sampling. Foundations and Trends® in Machine Learning, 11(1), 2018.
  92. Active hidden markov models for information extraction. In IDA, 2001.
  93. Safe exploration for active learning with gaussian processes. In ECML PKDD, 2015.
  94. Active learning for convolutional neural networks: A core-set approach. ICLR, 2017.
  95. Gaussian process regression: Active data selection and test point rejection. In Mustererkennung 2000. Springer, 2000.
  96. Settles, B. Active learning literature survey. Technical report, University of Wisconsin-Madison Department of Computer Sciences, 2009.
  97. An analysis of active learning strategies for sequence labeling tasks. In EMNLP, 2008.
  98. Partial is better than all: revisiting fine-tuning strategy for few-shot learning. In AAAI, volume 35, 2021.
  99. Experimental design for overparameterized learning with application to single shot deep active learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  100. To compress or not to compress–self-supervised learning and information theory: A review. arXiv preprint arXiv:2304.09355, 2023.
  101. Towards foundation models and few-shot parameter-efficient fine-tuning for volumetric organ segmentation. In MICCAI, 2023.
  102. Curriculum learning: A survey. IJCV, 2022.
  103. Gaussian process optimization in the bandit setting: No regret and experimental design. In ICML, volume 27, 2009.
  104. Safe exploration for optimization with gaussian processes. In ICML, 2015.
  105. Gosafeopt: Scalable safe exploration for global optimization of dynamical systems. Artificial Intelligence, 2023.
  106. Active learning helps pretrained models learn the intended task. NeurIPS, 35, 2022.
  107. Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML, 2019.
  108. JAX: A python library for differentiable optimal control on accelerators, 2023. URL http://github.com/google/trajax.
  109. Safe exploration for interactive machine learning. NeurIPS, 32, 2019.
  110. On information gain and regret bounds in gaussian process bandits. In AISTATS, 2021.
  111. Vapnik, V. Estimation of dependences based on empirical data. Springer Science & Business Media, 2006.
  112. A review of feature selection methods based on mutual information. Neural computing and applications, 24, 2014.
  113. Matching networks for one shot learning. NeurIPS, 29, 2016.
  114. Wainwright, M. J. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019.
  115. Max-value entropy search for efficient bayesian optimization. In ICML, 2017.
  116. More than a toy: Random matrix models predict how real-world neural representations generalize. In ICML, 2022.
  117. Tuning legged locomotion controllers via safe bayesian optimization. CORL, 2023.
  118. Wilks, S. S. Certain generalizations in the analysis of variance. Biometrika, 1932.
  119. Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA, 2006.
  120. A model-free algorithm to safely approach the handling limit of an autonomous racecar. In ICCVE, 2019.
  121. Passive sampling for regression. In ICDM, 2010.
  122. Active learning via transductive experimental design. In ICML, volume 23, 2006.
  123. Design of experiments for stochastic contextual linear bandits. NeurIPS, 34, 2021.
Citations (3)

Summary

  • The paper introduces red, a transductive active learning method that adaptively samples constrained data to reduce prediction uncertainty for specific targets.
  • The approach is rigorously analyzed and proven to converge to minimal uncertainty levels under standard machine learning assumptions.
  • Numerical experiments demonstrate that red outperforms state-of-the-art methods in few-shot fine-tuning and safe Bayesian optimization tasks.

Information-based Transductive Active Learning

The paper "Information-based Transductive Active Learning" introduces a novel approach named red, which stands for information-based transductive learning. This work extends the paradigm of active learning to more complex real-world problems where the available data points for training and the prediction targets may not align within the same domain. This is addressed by allowing sampling to occur in a constrained region while still aiming to make predictions beyond this region. The proposed method emphasizes maximizing informative sampling regarding specific prediction objectives, thus extending traditional active learning frameworks which primarily focus on selecting data points that will reduce uncertainty across the entire input space.

Methodological Contributions

The central contribution of this paper is the development of red, which is anchored in a transductive learning framework. The method is designed to adaptively sample from a potentially limited set of data points, aimed at reducing prediction uncertainty for a pre-specified set of targets. Notably, red is versatile, being applicable to varied learning scenarios including few-shot fine-tuning in neural networks and safe Bayesian optimization.

Under a broad range of assumptions typically used in machine learning, red is theoretically shown to asymptotically reach minimal achievable uncertainty levels using only accessible data. This is a key advancement, as it rigorously proves the method's convergence properties, essential for application in settings with limited data availability.

Numerical Results

The paper reports compelling numerical results highlighting the efficacy of red. In tasks involving few-shot fine-tuning of large neural networks, red surpasses existing state-of-the-art methods. This superior performance is similarly observed in the domain of safe Bayesian optimization, advocating the practicality and improved outcomes achievable by adopting this framework over traditional methods. Though the paper does not disclose specific quantitative metrics in this summary, such achievements are indicative of meaningful improvements in predictive performance and efficiency.

Theoretical Implications and Generalization

From a theoretical standpoint, the insights provided by red into transductive learning reinforce the relevance of targeted information acquisition rather than broader data exploration. This aligns with the discussion of Vapnik's principle regarding specific problem solving rather than engaging with more generalized ones. Consequently, this principle not only influences algorithm design directly but also impacts how we conceptualize learning in constrained or partially observable environments.

The paper includes extensive theoretical underpinnings, affirming the robustness of red across various kernel choices, as demonstrated by the complexity analysis typical for Gaussian processes. Specifically, the kernel-based information complexity is examined, yielding insights into red's capacity to manage computational demands effectively.

Future Directions in AI

Looking ahead, the grounding principles of red may set the stage for further integration of targeted learning approaches within broader AI systems. Especially in the context of deep learning, where data acquisition costs are high and domains are increasingly specialized, such methods present viable strategies to enhance model performance without necessitating extensive data expansion. Further exploration into adaptive sampling and domain-bound learning is anticipated, with potential expansions into reinforcement learning contexts and autonomous systems where environment-model asymmetry is prevalent.

In summation, this paper substantiates red as a compelling advancement in the active learning landscape, offering a rigorous yet practical approach to transductive learning challenges. Its contributions serve not only as a robust tool for current machine learning applications but also as a foundational reference for ongoing explorations into adaptive and domain-congruent learning methodologies.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.