Semantic Curiosity for Active Visual Learning (2006.09367v1)

Published 16 Jun 2020 in cs.CV, cs.AI, and cs.LG

Abstract: In this paper, we study the task of embodied interactive learning for object detection. Given a set of environments (and some labeling budget), our goal is to learn an object detector by having an agent select what data to obtain labels for. How should an exploration policy decide which trajectory should be labeled? One possibility is to use a trained object detector's failure cases as an external reward. However, this will require labeling millions of frames required for training RL policies, which is infeasible. Instead, we explore a self-supervised approach for training our exploration policy by introducing a notion of semantic curiosity. Our semantic curiosity policy is based on a simple observation -- the detection outputs should be consistent. Therefore, our semantic curiosity rewards trajectories with inconsistent labeling behavior and encourages the exploration policy to explore such areas. The exploration policy trained via semantic curiosity generalizes to novel scenes and helps train an object detector that outperforms baselines trained with other possible alternatives such as random exploration, prediction-error curiosity, and coverage-maximizing exploration.

Citations (67)

View on Semantic Scholar

Summary

The paper introduces semantic curiosity as an intrinsic reward mechanism that guides active visual learning to enhance object detection efficiency.
It employs a semantic mapping technique using RGB and depth images combined with a reward function based on prediction inconsistencies.
Evaluations in diverse simulated environments demonstrate that the approach outperforms baseline strategies, significantly boosting detection precision.

Semantic Curiosity for Active Visual Learning

Introduction and Motivation

The paper "Semantic Curiosity for Active Visual Learning" addresses the task of embodied interactive learning aimed at enhancing object detection performance. In contemporary computer vision, models typically rely on static datasets derived from internet images, limiting their ability to control the acquisition of data relevant to their learning objectives. The authors propose a paradigm shift towards active visual learning where an agent autonomously selects trajectories for labeling, thereby iterating on its learning process more dynamically. This approach mimics the interactive learning observed in humans, notably children who actively engage with their environment based on prior knowledge and expectations.

Methodology

The core innovation presented is the concept of semantic curiosity, an intrinsic reward mechanism designed to guide exploration policies without the need for extensive external labeling, which is often impracticably resource-intensive. Semantic curiosity focuses on detecting inconsistencies in labeling outputs across different observations, rewarding exploration paths that expose such inconsistencies. This facilitates the identification of areas where the current object detection model is weak, thus allowing it to optimize its trajectory and gather a diverse set of observations for improving its accuracy.

The main steps in this active learning process are:

Semantic Mapping: Incorporating RGB and Depth images to produce top-down semantic maps, enabling the association of object predictions across frames.
Intrinsic Reward Function: Defining a reward based on temporal inconsistencies, or entropy, of object predictions using a trained Mask RCNN. The policy is reinforced using semantic curiosity to explore regions where these inconsistencies are highest.
Implementation: The exploration policy is trained using Proximal Policy Optimization in large environments with diverse scenes, ensuring the policy generalizes well to novel environments.

Experimental Setup and Results

The methodology was evaluated using the Habitat simulator across multiple datasets, including Gibson, Matterport, and Replica, demonstrating the policy's adaptability and generalization across different scenes. Their approach outperformed several baseline exploration strategies such as random sampling, prediction error curiosity, and coverage-maximizing exploration in terms of object detection performance in unseen environments.

Key numeric findings include:

Increased Semantic Curiosity Reward: Borrowing from intrinsic motivation paradigms, scenes unearthed through semantic curiosity demonstrated higher temporal inconsistencies, thereby providing valuable data for refining object detectors.
Enhanced Object Detection: When trajectories exposed by semantic curiosity were labeled and used for training, the resulting object detectors showcased superior average precision scores compared to those trained with baseline policies.

Implications and Future Directions

This work has significant implications for the field of active learning and computer vision, proposing a shift towards more autonomous and adaptive systems capable of refining themselves based on internal assessments of their performance. The semantic curiosity framework allows for improved data efficiency, reducing the reliance on costly and labor-intensive labeling processes.

Potential future developments may focus on:

Extending semantic curiosity to encompass environments beyond simulators to real-world settings, improving robustness against real-world sensory noise.
Leveraging semantic curiosity in robotic systems for real-time adaptive strategies that balance exploration and exploitation based on current detection performance.
Exploring the integration of semantic curiosity with other AI systems to bolster multi-modality learning and environmental understanding.

In summary, this paper presents valuable insights into the paradigm of embodied visual learning and sets a promising direction for further exploration in dynamically interactive AI systems.

PDF Markdown

Related Papers

YouTube

Show All Videos