Multiple Instance Learning: A Survey of Problem Characteristics and Applications (1612.03365v1)

Published 11 Dec 2016 in cs.CV, cs.AI, and cs.IR

Abstract: Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document classification. However, learning from bags raises important challenges that are unique to MIL. This paper provides a comprehensive survey of the characteristics which define and differentiate the types of MIL problems. Until now, these problem characteristics have not been formally identified and described. As a result, the variations in performance of MIL algorithms from one data set to another are difficult to explain. In this paper, MIL problem characteristics are grouped into four broad categories: the composition of the bags, the types of data distribution, the ambiguity of instance labels, and the task to be performed. Methods specialized to address each category are reviewed. Then, the extent to which these characteristics manifest themselves in key MIL application areas are described. Finally, experiments are conducted to compare the performance of 16 state-of-the-art MIL methods on selected problem characteristics. This paper provides insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research.

PDF Abstract

Overview of "Multiple Instance Learning: A Survey of Problem Characteristics and Applications"

This paper offers a detailed survey of Multiple Instance Learning (MIL) by examining its problem characteristics and applications. MIL is a weakly supervised learning approach where instances are grouped into bags, each with a single label. This approach fits various scenarios where only aggregated labels are available, and it has been applied in fields such as computer vision and document classification. However, the challenges specific to MIL, stemming from its unique data structure, have yet to be comprehensively addressed. This survey categorizes these challenges, reviews methods suited to each, and evaluates algorithm performance in different MIL contexts.

Key Problem Characteristics

The paper identifies four broad categories that define MIL problems and influence algorithm performance:

Prediction Level:
- The distinction between instance-level and bag-level tasks is essential. While bag classification is more common, instance classification presents additional complexity due to the different misclassification costs at the instance level. Methods like mi-SVM and SI-SVM demonstrate high effectiveness in instance classification, suggesting a preference for methods that treat instances independently of their bags.
Bag Composition:
- Witness Rate: Algorithms perform variably at different witness rates (WR), affecting methods using instance distributions or those that average instances. Specialized methods such as stMIL and miGraph effectively manage low WR challenges.
- Relations Between Instances: Intra-bag similarities, instance co-occurrence, and structured instances impact algorithm performances. Leveraging these relationships can enhance classification accuracy, as seen in methods such as miGraph and spectral methods like CRF for temporal and spatial dependencies.
Data Distributions:
- Many algorithms implicitly assume unimodal positive distributions and can struggle with multimodal distributions. Methods like non-parametric classifiers and kernel-based approaches effectively tackle such distributional challenges. Additionally, ensuring the negative distribution is adequately modeled can significantly enhance algorithm robustness.
Label Ambiguity:
- The presence of label noise and different label spaces requires algorithms to either model instance distributions or apply strategies like thresholding to handle ambiguous and potentially noisy labels.

Application Areas

MIL's versatility is evident in several application areas:

Computer Vision: Utilized in object localization and segmentation tasks using weakly labeled data. Challenges here include multimodal distributions and bag structures due to diverse object appearances and spatial contexts.
Text and Document Classification: Applications involve classifying documents through embeddings and understanding semantic context using models like BoW.
Bioinformatics and Drug Design: MIL aids in binding site identification and other problems where instances (e.g., molecular conformations) are not independently observable but collectively evaluated.
Medical Imaging: Weakly supervised diagnosis tasks benefit from MIL by using patient-level diagnoses instead of pixel-level annotations.

Experimental Findings and Implications

Experiments conducted with 16 MIL algorithms reveal the influence of problem characteristics on performance. Notably:

Instance independence methods (e.g., mi-SVM) often outperform when high WR diminishes the need for bag-based learning.
Bag-level methods excel in scenarios with non-representative negative distributions, highlighting the importance of bag-level features in such contexts.

Recommendations for the MIL community include utilizing diverse benchmark datasets beyond traditional ones like Musk and TEF, covering broader problem spaces. Moreover, emerging areas such as feature learning, multi-modality, and MIL's applicability to large-scale, real-world datasets present future research opportunities.

Through its detailed analysis, this paper provides valuable insights into the varied applications and inherent challenges of MIL, serving as a resource for designing methods tailored to specific characteristics of MIL problems.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Veronika Cheplygina (52 papers)
Eric Granger (121 papers)
Ghyslain Gagnon (10 papers)
Marc-André Carbonneau (16 papers)

Citations (569)

View on Semantic Scholar

Multiple Instance Learning: A Survey of Problem Characteristics and Applications (1612.03365v1)

Overview of "Multiple Instance Learning: A Survey of Problem Characteristics and Applications"

Key Problem Characteristics

Application Areas

Experimental Findings and Implications

Related Papers