Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Advancing Deep Active Learning & Data Subset Selection: Unifying Principles with Information-Theory Intuitions (2401.04305v3)

Published 9 Jan 2024 in cs.LG, cs.IT, and math.IT

Abstract: At its core, this thesis aims to enhance the practicality of deep learning by improving the label and training efficiency of deep learning models. To this end, we investigate data subset selection techniques, specifically active learning and active sampling, grounded in information-theoretic principles. Active learning improves label efficiency, while active sampling enhances training efficiency. Supervised deep learning models often require extensive training with labeled data. Label acquisition can be expensive and time-consuming, and training large models is resource-intensive, hindering the adoption outside academic research and "big tech." Existing methods for data subset selection in deep learning often rely on heuristics or lack a principled information-theoretic foundation. In contrast, this thesis examines several objectives for data subset selection and their applications within deep learning, striving for a more principled approach inspired by information theory. We begin by disentangling epistemic and aleatoric uncertainty in single forward-pass deep neural networks, which provides helpful intuitions and insights into different forms of uncertainty and their relevance for data subset selection. We then propose and investigate various approaches for active learning and data subset selection in (Bayesian) deep learning. Finally, we relate various existing and proposed approaches to approximations of information quantities in weight or prediction space. Underpinning this work is a principled and practical notation for information-theoretic quantities that includes both random variables and observed outcomes. This thesis demonstrates the benefits of working from a unified perspective and highlights the potential impact of our contributions to the practical application of deep learning.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Andreas Kirsch (30 papers)
Citations (6)

Summary

Overview of Advances in Deep Active Learning and Data Subset Selection

The dissertation "Advancing Deep Active Learning & Data Subset Selection: Unifying Principles with Information-Theory Intuitions" examines the optimization of deep learning models primarily focusing on label and training efficiency. With an underlying information-theoretic perspective, the work investigates active learning and sampling towards achieving more practical deep learning model applications outside academic and large-scale tech environments.

Context and Motivation

Deep learning models often pose challenges due to their extensive data requirements and resource-intensive training processes. These demands can inhibit widespread adoption outside well-funded research labs or major technology companies. This thesis tackles these obstacles by concentrating on data subset selection, specifically within active learning frameworks and grounded on principles of information theory, aiming for enhanced model efficiency in labeling and training.

Disentangling Uncertainty in Deep Learning

The thesis begins by exploring the decomposition of epistemic and aleatoric uncertainty, which provides valuable insights into understanding different forms of uncertainty crucial for data subset selection. This separation of uncertainties plays a significant role in improving model performance in various applications, such as active learning where selecting the most informative data points can significantly reduce label requirements.

Information-Theoretic Approaches

A central aspect of the thesis is the employment of information-theoretic concepts to establish a more principled framework for data subset selection. It presents the development of various active learning strategies, analyzing both existing and newly formulated approaches through approximations of information quantities in weight or prediction space. This principled methodology underscores the potential benefits of unifying information-theoretical insights with practical applications.

Empirical Evaluation and Unifying Frameworks

The empirical component of the thesis evaluates these approaches against benchmarks, showcasing their effectiveness in enhancing label and training efficiency. The work further extends to theoretically integrate active learning and active sampling, demonstrating how information-theory-driven methods can be aligned to address computational constraints while optimizing performance.

Implications and Future Directions

By focusing on label and training efficiency through the lens of information theory, this thesis contributes significantly to making deep learning more accessible and practical. It paves the way for future explorations into AI applications where data and resource limitations are prevalent. While the approaches presented are robustly grounded in theory, the insights gained may inspire further developments in the field—potentially influencing how deep learning systems handle uncertainty and optimize their learning processes efficiently.

In summary, this dissertation illustrates how deep learning's broad applicability can be expanded through intelligent data subset selection, informed by a principled understanding of uncertainty and information theory. This work not only offers practical advancements but also enriches the theoretical landscape, suggesting avenues for future inquiry and innovation in AI model training paradigms.