Papers
Topics
Authors
Recent
2000 character limit reached

Tencent Video Dataset (TVD): A Video Dataset for Learning-based Visual Data Compression and Analysis (2105.05961v1)

Published 12 May 2021 in eess.IV

Abstract: Learning-based visual data compression and analysis have attracted great interest from both academia and industry recently. More training as well as testing datasets, especially good quality video datasets are highly desirable for related research and standardization activities. Tencent Video Dataset (TVD) is established to serve various purposes such as training neural network-based coding tools and testing machine vision tasks including object detection and tracking. TVD contains 86 video sequences with a variety of content coverage. Each video sequence consists of 65 frames at 4K (3840x2160) spatial resolution. In this paper, the details of this dataset, as well as its performance when compressed by VVC and HEVC video codecs, are introduced.

Citations (16)

Summary

  • The paper introduces the Tencent Video Dataset (TVD), a new high-quality 4K video dataset designed to advance research in learning-based visual data compression and analysis.
  • TVD includes 86 diverse 4K video sequences and an annotated image subset for benchmarking both video compression codecs like VVC/HEVC and machine vision tasks such as object detection.
  • Evaluations using TVD demonstrated that the VVC codec achieved significantly better compression efficiency compared to HEVC, specifically showing approximately 33% BD-rate reduction under Random Access configuration.

The paper "Tencent Video Dataset (TVD): A Video Dataset for Learning-based Visual Data Compression and Analysis" details the development and utilization of the Tencent Video Dataset (TVD), designed to enhance research in learning-based visual data compression and analysis. The dataset addresses an existing gap in quality video data necessary for advancing neural network-based video compression algorithms and serves as a testing set for machine vision tasks, such as object detection and tracking.

Dataset Composition and Purpose:

  • TVD consists of 86 video sequences, each having 65 frames captured at a 4K resolution of 3840x2160 pixels.
  • It serves multiple research purposes, including training neural network-based coding tools and evaluating video codecs like the Versatile Video Coding (VVC) and High Efficiency Video Coding (HEVC).
  • For object detection, a subset of 166 images sampled from TVD are provided at 1920x1080 resolution with bounding box annotations, which are used for benchmarking within MPEG’s Video Coding for Machines (VCM) context.

Data Collection and Format:

  • Video sequences were captured using high-end cameras such as Red Helium 8k, Red Monstro 8K, and Blackmagic URSA Mini Pro 12K, ensuring high fidelity and diverse scene variety.
  • These captures are transcoded and converted to the YUV 4:2:0 color format utilizing FFmpeg, with the possibility of utilizing higher original resolutions for specific clips.

Compression Analysis:

  • The paper evaluates TVD's performance when compressed using VVC (VTM-11.0) and HEVC (HM-16.23) codecs.
  • Various configurations, including All Intra (AI), Random Access (RA), Low Delay B (LDB), and Low Delay P (LDP), are explored across Quantization Parameters (QP) of 22, 27, 32, 37, and 42.
  • Average PSNR values for Y, U, and V channels across QPs and configurations are provided, indicating performance metrics for each codec.

Key Findings:

  • VVC (VTM-11.0) was found to provide superior compression efficiency over HEVC (HM-16.23), particularly noticeable in BD-rate reductions: approximately 33% improvement under RA configuration, with PSNR as the distortion metric.
  • This efficiency is pronounced across all channels (Y, U, V), highlighting VVC’s capabilities in achieving higher quality at lower bitrates.

Conclusion and Applications:

  • TVD supports a variety of machine learning and standardization initiatives, as evidenced by its integration into JVET’s Neural Network-based Video Coding (NNVC) activities and MPEG VCM testing conditions.
  • The dataset is valuable for training neural networks for video compression and machine-oriented visual tasks, enhancing developments in object detection, segmentation, and tracking.

Restrictions and Usage:

  • Intellectual property of the dataset is retained by Tencent, Shenzhen Boyan Technology Ltd., and Tsinghua University, with specified allowable uses in technical papers, research, and standardization activities. Restrictions are imposed on commercialization, redistribution, and usage in marketing or entertainment.

This comprehensive dataset is positioned to significantly contribute to the fields of visual data compression and analysis, facilitating both academic and industrial advancements through its high-resolution, annotated video sequences and thorough evaluation against cutting-edge video codecs.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.