HFR-LS: High Frame Rate Live Streaming Dataset

Updated 3 February 2026

HFR-LS is a high frame rate live streaming benchmark that evaluates temporal fidelity versus compression under fixed bitrate constraints.
It comprises 32 source videos encoded into 384 clips using H.264 with bitrates from 5 to 15 Mbps and frame rates from 30 to 120 fps, reflecting real streaming ladders.
Subjective tests using ITU-R BT.500 protocols reveal significant frame rate and bitrate interactions, informing adaptive streaming and VQA model development.

The High Frame Rate Live Streaming (HFR-LS) Dataset is a subject-rated video database that systematically explores the perceptual trade-offs between frame rate (temporal fidelity) and compression (spatial fidelity) under fixed bitrate constraints, reflecting realistic encoding scenarios in live video streaming. Designed for benchmarking perceptual quality, the HFR-LS dataset supports the development and evaluation of both encoding schemes and video quality assessment (VQA) models for bitrate-constrained high-frame-rate applications (He et al., 27 Jan 2026).

1. Scope, Content, and Encoding Paradigm

HFR-LS is constructed from 32 source video sequences, each 5 seconds in duration, originally captured at 120 frames per second (fps) in YUV 4:2:0 8-bit format and downscaled to 1920×1080 resolution. Sources are drawn from BVI-HFR (22 sequences), UVG (5 sequences), and LIVE-YT-HFR (5 sequences), yielding diverse content with broad coverage in spatial information (SI) and temporal information (TI), per ITU-T P.910 recommendations. Approximately one-third of sequences (12/32) contain camera motion, while others vary from low to high temporal complexity.

Each source video is re-encoded using the H.264 codec (FFmpeg preset=fast, tune=zerolatency) at fixed 1080p resolution via four discrete target bitrates—5, 7, 10, and 15 Mbps—and three frame rates—30, 60, and 120 fps—with frame-rate reduction realized through frame dropping. This encoding "ladder" results in 32 × (4 × 3) = 384 processed video clips. The dataset is organized to represent prototypical "live streaming ladder" choices as recommended by leading streaming platforms (e.g., YouTube, Twitch, Facebook Live) (He et al., 27 Jan 2026).

2. Subjective Evaluation Methodology

The subjective quality evaluation is conducted under a single-stimulus, hidden-reference protocol conforming to ITU-R BT.500-15. All 384 processed clips, together with their respective 120 fps/maximum-quality reference, are presented in randomized order on a calibrated 120 Hz, 1080p monitor. Thirty naïve observers (12 female, 18 male; age 20–30; normal or corrected vision) participate, with each subject rating all videos independently on a continuous 0–100 quality scale.

Sessions (~2,080 seconds total video) are divided into two blocks separated by a 5-minute break to mitigate observer fatigue. A brief training phase familiarizes each subject with the protocol. Outlier scores are identified and removed according to ITU-R BT.500 guidelines. Each subject’s scores $s_{ij}$ (for subject $i$ on clip $j$ ) are normalized to z-scores:

$z_{ij} = \frac{s_{ij} - \mu_i}{\sigma_i}$

Linearly rescaled:

$z'_{ij} = 100 \cdot \frac{z_{ij} + 6}{9}$

The mean opinion score (MOS) for each clip is:

$\mathrm{MOS}_j = \frac{1}{N} \sum_i z'_{ij}$

Difference MOS (DMOS) is reported relative to the hidden 120 fps reference:

$\mathrm{DMOS}_j = \mathrm{MOS}_\mathrm{ref} - \mathrm{MOS}_j$

Inter-subject consistency is high (median Pearson $r\approx0.91$ over random splits), indicating reliable perceptual data.

3. Data Organization and Accessibility

The dataset is hosted at https://github.com/real-hjq/HFR-LS and includes:

videos/
- source/: 32 uncompressed YUV420p reference sequences at 120 fps.
- processed/: 384 H.264 MP4s, named SRCXX_BRYY_FRZZ.mp4 (XX=seq. ID, YY=bitrate Mbps, ZZ=frame rate).
metadata.csv: Contains source ID, bitrate (Mbps), frame rate (fps), SI, TI, MOS, DMOS for each processed clip.
dmosaes/: 384×1 vector of DMOS scores, ordered by clip.
scripts/: Python tools for metadata parsing, objective metric computation, and figure regeneration (with requirements.txt).
Download: git clone https://github.com/real-hjq/HFR-LS.git
Usage: Load metadata (python load_metadata.py), evaluate reference models (python eval_vqa.py).

Researchers can directly access the quality-annotated processed clips, metadata, and analysis scripts for benchmarking and custom VQA model development.

4. Perceptual Findings and Statistical Analysis

The subjective study exposes a statistically significant main effect of frame rate on perceived quality (one-way ANOVA $p=4.47\times10^{-4}$ ). Significant interactions are found between bitrate and frame rate (two-way ANOVA $p=4.15\times10^{-5}$ ), and between source content and frame rate ( $p=4.79\times10^{-3}$ ).

Key empirical results:

At low bitrates (5 Mbps), high frame rates (120 fps) introduce pronounced compression artifacts, degrading quality.
For sequences with camera motion ( $\mathrm{TI}>6$ ), 60 fps outperforms 30 fps at moderate bitrates. 120 fps only surpasses 60 fps at bitrates above ~12 Mbps.
For static or low-motion content ( $\mathrm{TI}<6$ ), frame rate differentials remain minor until bitrates exceed ~10 Mbps, where 60 fps and 120 fps offer modest subjective gains.

These findings clarify that optimal bitrate allocation between spatial and temporal fidelity should be content-aware, especially under strict bandwidth budgets.

5. Objective VQA Metrics and Benchmarking

Full-reference objective VQA metrics computed on the dataset include PSNR, SSIM, LPIPS, DISTS, and VMAF, accompanied by no-reference metrics such as NIQE, VSFA, Li22, DOVER, ModularVQA, and MinimalisticVQA. PSNR is calculated as:

$\mathrm{PSNR} = 10 \log_{10} \left(\frac{MAX_I^2}{\mathrm{MSE}} \right)$

where $MAX_I=255$ for 8-bit video. SSIM is defined per Wang et al.:

$\mathrm{SSIM}(x,y) = \frac{(2\mu_x\mu_y + C_1)(2\sigma_{xy}+C_2)}{(\mu_x^2+\mu_y^2 + C_1)(\sigma_x^2+\sigma_y^2 + C_2)}$

To benchmark objective models against subjective scores, predictions $p_j$ are mapped to the subjective DMOS range using a four-parameter logistic function (VQEG):

$f(p) = \beta_1 - \frac{\beta_2}{1+\exp\left(-\frac{p-\beta_3}{|\beta_4|}\right)} + \beta_2$

The Pearson Linear Correlation Coefficient (PLCC) is then computed between $f(p_j)$ and $\mathrm{DMOS}_j$ . Spearman’s Rank Correlation Coefficient (SRCC) is also reported. Benchmark results indicate that MinimalisticVQA achieves $\mathrm{PLCC}\approx0.596$ and $\mathrm{SRCC}\approx0.407$ , revealing substantial headroom for models explicitly capturing temporal effects and the bitrate-frame-rate trade-off (He et al., 27 Jan 2026).

6. Significance and Context in HFR Video Assessment

HFR-LS is positioned among a limited set of publicly available HFR subjective datasets; it is unique in its systematic exploration of the joint effects of compression level and frame rate within a controlled bitrate ladder, designed for the live streaming context. Sequences are derived in part from major HFR benchmarks including BVI-HFR (Madhusudana et al., 2020), UVG, and LIVE-YT-HFR (Madhusudana et al., 2020), but HFR-LS distinguishes itself by its detailed, bitrate-aware encoding design and comprehensive MOS/DMOS labeling aligned with live streaming deployment strategies.

The dataset serves as a testbed for advancing VQA research in dynamic adaptive streaming, where efficient allocation between spatial and temporal fidelity is central. A plausible implication is that VQA models optimized with this dataset will better support perceptual-based bitrate adaptation algorithms, particularly for live streaming scenarios featuring camera motion, variable content complexity, and stringent real-time constraints.

HFR-LS complements and extends prior HFR datasets such as LIVE-YT-HFR, which explores a broader range of frame rates (24–120 fps) and compression settings but is not specifically tailored to live streaming ladders or constrained bitrate regimes (Madhusudana et al., 2020). Datasets such as Need for Speed (NfS) target object tracking accuracy at extreme frame rates (240 fps) for vision tasks, emphasizing tracker performance rather than perceptual compression artifacts (Galoogahi et al., 2017).

HFR-LS's explicit consideration of both encoding parameters and subjective quality under conditions representative of live platform streaming ladders makes it a pivotal resource for research in real-time adaptive streaming, perceptual optimization, and objective video quality prediction under bandwidth constraints.