Papers
Topics
Authors
Recent
2000 character limit reached

Chinese Opera Video Clip (COVC) Dataset

Updated 16 November 2025
  • COVC is a domain-specific dataset featuring over 115,000 archival Chinese opera clips, highlighting rich high-frequency textures and challenging inter-frame motion.
  • The dataset is meticulously preprocessed into septuple-frame sequences with stringent quality controls and data augmentation for robust STVSR evaluation.
  • Benchmark comparisons reveal that COVC outperforms conventional datasets in textural detail and motion range, propelling advances in heritage preservation and video restoration research.

The Chinese Opera Video Clip (COVC) dataset is a large-scale, domain-specific benchmark for space-time video super-resolution (STVSR), directly targeting the unique restoration and enhancement challenges posed by historical Chinese opera recordings. Comprising over 115,000 curated video clips extracted from mid-20th century to early digital-era archival materials, COVC offers a combination of high-frequency textures and pronounced inter-frame motion not represented in pre-existing benchmarks. The dataset facilitates research on super-resolution methods tailored to cultural-heritage preservation, supporting both quantitative evaluation and qualitative assessment on a domain of complex theatrical content.

1. Collection, Source Material, and Preprocessing

COVC is constructed from 33 distinct archival Chinese-opera videos, spanning a temporal range from mid-20th century film to early digital recordings. Original sources include single-camera stage captures, with spatial resolutions distributed as follows: nine at 1920×1080 (1080p), eight at 1280×720 (720p), and sixteen at 854×480 (480p) or lower. The majority of clips are recorded at 24 fps (cinematic standard), with a minority at 60 fps to capture high-motion scenes. Limitations of historical acquisition devices (both in spatial and temporal fidelity) motivated stringent clip selection based on bitrate, visual quality, and the exclusion of frames exhibiting all-black borders to avoid distorting standard quantitative metrics such as PSNR.

The preprocessing pipeline extracts continuous septuples (sequences of seven consecutive frames) from source videos, discarding sequences with boundary artifacts. Data augmentation includes random cropping (128×128 from HR frames), horizontal/vertical flips, and 90° rotations. Low-resolution (LR) crops are synthesized using bicubic downsampling by a factor s=2s=2, leading to LR images sized 64×64. This follows the transformation IL=s(IH)I^L = \downarrow_s(I^H), where s\downarrow_s denotes bicubic downsampling on both axes.

2. Statistical Properties and Dataset Splits

COVC comprises 115,548 clips (NN), each containing L=7L=7 frames (for a total of 808,836 frames). During training, networks operate on 128×128128\times128 HR patches and 64×6464\times64 LR patches. The train/test partition is approximately 90%/10%: 104,138 clips for training and 11,410 for testing. The test set undergoes subjective quality stratification—5,120 high, 3,150 medium, and 3,140 low-quality clips—based on visual clarity. No dedicated validation set is provided; cross-validation is performed at the user’s discretion. This test stratification facilitates performance analysis on varying restoration difficulties.

COVC demonstrates high textural complexity, with per-frame high-frequency content averaging approximately 75% (measured via high-pass filtering), compared to 65% in the widely used Vimeo90K dataset. Furthermore, COVC is characterized by sizable inter-frame motions (e.g., head gestures, rapid costume changes), contextually observed to exceed conventional video datasets, although not quantified via optical-flow histograms.

3. Annotation Framework and Metadata

Each septuple carries quality-tier annotation (high, medium, low) assigned through subjective clarity assessment. Metadata is consolidated in a JSON index keyed by clip, providing original resolution, frame rate, quality tier, and a source video identifier. No actor identity or opera-type taxonomies are present in the initial release, making the dataset agnostic to performer and sub-genre but tailored for video restoration research. The internal metadata structure enables reproducible splits and downstream analysis or reorganization as required by new STVSR methodologies.

4. Quality Assurance Protocols and STVSR Benchmarks

Quality assurance is managed through manual curation: all sequences with severe visual corruption or all-black borders are removed. Consistency checks guarantee that every septuple contains only visually valid, artifact-free data.

For benchmarking, representative STVSR models—including VideoINR (CVPR’22), RSTT (CVPR’22), Cycmunet+ (TPAMI’23), 3DAttGAN (TETCI’24), and BF-STVSR (arXiv’25)—are retrained on COVC. On the medium-quality test subset, prior best-performing models (e.g., 3DAttGAN) achieved 31.67\approx31.67 dB PSNR and 0.837 SSIM, while the MambaOVSR model reported 31.86\approx31.86 dB PSNR and 0.9438 SSIM. Over all test subsets, MambaOVSR demonstrates outperformance by 5–6% relative PSNR and increases SSIM by 0.04–0.06 absolutely with respect to prior SOTA, establishing COVC as a substantive discriminator of both spatial and temporal fidelity in method evaluation.

5. Comparison to Other STVSR Benchmarks

A direct comparison highlights COVC’s unique contribution to the field:

Dataset # Clips Clip Length Domain Texture Density Motion Range
Vid4 4 50–100 General small Low Mild
REDS 300 ~100 Real-world Medium Medium–High
Vimeo90K-T ~90,000 7 General scenes ~65% high-freq Moderate
COVC (ours) 115,548 7 Chinese opera ~75% high-freq Large

COVC provides approximately 1.6× the number of clips of Vimeo90K, with greater textural detail (75% vs. 65% high-frequency content) and the first exclusive focus on traditional Chinese opera. It includes more extreme inter-frame displacements than commonly used alternatives, addressing the motion regimes typical of theatrical performance but absent in standard video datasets.

6. Access, Licensing, and Application Scope

The COVC dataset and associated codebase are scheduled for public release concurrent with the originating paper’s publication. Distribution will occur under a non-commercial research license (CC-BY-NC), with comprehensive access instructions, JSON metadata schema, and usage guidelines documented in the repository’s README. The dataset structure and licensing model facilitate adoption by both the STVSR and digital-heritage preservation communities.

A plausible implication is that COVC will accelerate the development and assessment of novel super-resolution architectures—especially those incorporating explicit global motion modeling—by serving as a domain-realistic, high-motion, high-frequency testbed. Its focus on historically and culturally significant content positions it as an essential resource for video restoration and intangible-heritage archival research.

7. Significance for Cultural Heritage and Scientific Advancement

COVC closes a critical methodological gap in STVSR: the scarcity of large, rigorously curated datasets reflecting the complexities of archival performing arts. By offering over 115,000 Chinese opera tracklets with both rich high-frequency detail and challenging temporal dynamics, it supports the creation and scrutiny of restoration tools ultimately intended for practical deployment in cultural-heritage institutions. Future research may extend upon COVC by incorporating additional metadata (such as performer identity or opera type) or by pairing low- and high-quality digital transfers for learning-based domain adaptation. The dataset establishes a rigorous benchmark for evaluating both conventional and advanced model architectures aimed at recovering lost detail in complex, historically significant footage.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Chinese Opera Video Clip (COVC) Dataset.