The VIA Annotation Software for Images, Audio and Video (1904.10699v3)

Published 24 Apr 2019 in cs.CV

Abstract: In this paper, we introduce a simple and standalone manual annotation tool for images, audio and video: the VGG Image Annotator (VIA). This is a light weight, standalone and offline software package that does not require any installation or setup and runs solely in a web browser. The VIA software allows human annotators to define and describe spatial regions in images or video frames, and temporal segments in audio or video. These manual annotations can be exported to plain text data formats such as JSON and CSV and therefore are amenable to further processing by other software tools. VIA also supports collaborative annotation of a large dataset by a group of human annotators. The BSD open source license of this software allows it to be used in any academic project or commercial application.

Citations (809)

View on Semantic Scholar

Summary

The paper introduces VIA, a lightweight browser-based tool that enables manual annotation of images, audio, and video without installation.
The paper details VIA’s support for diverse spatial and temporal annotations using various shapes and metadata inputs.
The paper highlights VIA’s collaborative design and open-source approach, leading to widespread adoption and continuous development.

Overview of The VIA Annotation Software for Images, Audio, and Video

The paper, authored by Abhishek Dutta and Andrew Zisserman from the Visual Geometry Group (VGG) at the University of Oxford, presents the VGG Image Annotator (VIA), a tool capable of manual annotation for images, audio, and video. VIA stands out due to its simplicity, ease of use, and offline functionality, requiring no installation or setup and running directly in modern web browsers.

Fundamental Features

VIA offers human annotators a means to define and describe spatial regions in images or video frames and temporal segments in audio or video. The annotations, which span a range of shapes such as rectangles, circles, polygons, or points, can be exported into standard formats like JSON and CSV for interoperability with other tools. Importantly, VIA supports collaborative annotation, facilitating its use across diverse academic and industrial projects.

Technical Overview

VIA is implemented using HTML, JavaScript, and CSS, and operates as a single HTML file under 400 kilobytes. Its minimalistic design ensures rapid deployment and usability even by non-technical users. Since its inception in August 2016, VIA has seen multiple version updates, with the latest supporting temporal annotations for audio and video. As of mid-2019, it has been utilized over a million times, reflecting its wide acceptance and utility.

Image Annotation

The tool supports various shapes for spatial annotation, including rectangles, circles, ellipses, polygons, points, and polylines. These annotations can be described using textual metadata, which can be input using several predefined interface elements like checkboxes, radio buttons, and dropdowns, ensuring consistency across annotations. The inclusion of these features is essential for projects requiring detailed and complex annotations, such as those capturing facial landmarks or object boundaries in microscopic images.

Image Group Annotation

For large datasets, VIA incorporates an Image Grid View feature that supports a two-stage annotation process involving automatic computer vision-based preliminary annotations followed by manual review and correction. This approach reduces the annotation burden on humans by leveraging automated tools to handle initial, often labor-intensive, annotation tasks.

Audio and Video Annotation

VIA extends its utility to the temporal domain by enabling annotation of audio and video segments. Tasks such as speaker diarization can be efficiently performed using the tool, making it valuable for both academic research and practical applications. Annotators can delineate and describe speech segments or identify activity regions within videos, enhancing the range of multimedia applications VIA can support.

Software Design Principles

VIA’s design emphasizes simplicity and minimalism, with a user interface constructed from standard HTML components styled using CSS. The software’s lightweight implementation, free from external dependencies, ensures compatibility and ease of use across various operating systems and browsers. The open-source community plays a crucial role in its continuous development, contributing feedback and code enhancements to maintain the software’s relevance and efficiency.

Open Source Ecosystem

The open-source licensing model under the BSD license has significantly contributed to VIA’s adoption and improvement. The community-driven development facilitated through platforms like Gitlab has led to diverse contributions, ensuring VIA remains adaptable to new annotation challenges and technological advancements.

Academic and Industrial Impact

VIA's influence spans multiple academic disciplines and industrial sectors. It has been instrumental in creating annotated datasets across fields such as humanities, computer science, history of art, physical sciences, and medicine. Furthermore, industrial entities have integrated VIA into their workflows, adapting the tool to meet specific annotation needs.

Future Development

Future enhancements of VIA aim to introduce collaborative annotation to support large-scale datasets and integrate plugins that leverage state-of-the-art computer vision models to assist annotators. These plugins, facilitated by technologies such as TensorFlow.js, could significantly expedite the annotation process by providing automated suggestions, which annotators can then refine.

Conclusion

The VIA annotation tool exemplifies the effective blending of simplicity and functionality, becoming an indispensable resource in various academic and industrial annotation tasks. Continual development aligned with user feedback and emerging technological trends ensures that VIA will remain a critical tool for manual annotation tasks in the foreseeable future.

PDF Markdown