SFV+HDR Quality Assessment

Updated 24 April 2026

SFV+HDR Quality is the evaluation of high dynamic range in short-form videos, leveraging both subjective testing and objective metrics to capture unique visual characteristics.
The assessment integrates deep feature-based VQA methods with perceptually tuned datasets to accurately model tone-mapping, distortion modes, and codec impacts.
Advancements such as HDRPatchMAX and HDRMAX-enhanced metrics demonstrate significant improvements in reliability, efficiency, and cross-device generalization.

The term SFV+HDR Quality refers to the perceptual and algorithmic assessment of short-form video (SFV) content presented with High Dynamic Range (HDR) encoding. SFV+HDR assessment is technically distinct from traditional User Generated Content (UGC) or Standard Dynamic Range (SDR) video evaluation due to the unique visual statistics, distortion modes, and dynamic range characteristics inherent to HDR acquisition, tone-mapping, compression, and display. This entry systematically surveys the contemporary methodologies, datasets, metrics, and findings specific to SFV+HDR quality, with a focus on both subjective testing and objective metric development, cross-referenced throughout with recent high-precision arXiv resources.

1. Datasets and Subjective Assessment Protocols

Large-scale, domain-representative subjective datasets underpin current SFV+HDR quality research. The "YouTube SFV+HDR Quality Dataset" consists of 4,030 short-form videos (5 s, 1080×1920@30fps) spanning 10 content categories (Animal, Cooking, Dance, Gameplay, Health, Hobby, Music, Society, Speech, Sports), with 2,030 native SDR clips and 2,000 PQ/HLG HDR clips, of which a subset underwent platform tone-mapping to produce HDR2SDR pairs. A three-step stratified sampling framework guarantees uniform coverage in the SI (spatial information), TI (temporal information), and UVQ (blind quality proxy) space. Each clip is rated by 25–40 (SDR) or 10 (HDR) professional labelers on 1–5 MOS (mean opinion score), with rigorous CI estimation (Wang et al., 2024).

The Beyond8Bits dataset expands the scale for UGC: 44,000 HDR video clips (≤10 s), 1.5M crowd ratings, diverse device capture, and compression ladders (0.2–5 Mbps, 360p–1080p). Ratings were acquired via ITU-R BT.500-14 continuous scales, fit via SUREAL MLE estimation (Saini et al., 1 Mar 2026).

Controlled laboratory studies such as AIC-HDR2025 target near-identical source material, producing 100 HDR test images (5 sources × 4 codecs × 5 QPs) canonicalized in Rec. 2100 PQ, RGB 10 bit. Subjective fidelity is measured using the JPEG AIC-3 protocol, combining plain and boosted (zoomed/flickered) triplet comparisons and reconstructing JND scales with maximum-likelihood Thurstone Case V modeling, producing sub-JND resolution (CI ≈ 0.27 at 1 JND) (Jenadeleh et al., 14 Jun 2025).

2. Distortion Taxonomies and Failure Modes

SFV+HDR content exhibits a suite of perceptually distinct artifacts compared to SDR. Principal HDR-specific distortions include near-black crushing (detail loss in shadows and increased noise visibility), highlight clipping (loss of texture, "blown-out" regions), banding (contour visibility from quantization), and exposure flicker (temporal incoherence from auto-exposure or rate-control). UGC-specific phenomena such as blocking, ringing, and chroma bleeding can be accentuated by wide color gamuts and elevated bit depth. HDR2SDR conversions collapse dynamic range, degrading objective metric correlation and masking reference detail (Saini et al., 1 Mar 2026, Wang et al., 2024).

In display-dependent studies, crossover points in MOS preference between HDR and SDR are documented (e.g., OLED: HDR preferred across bitrates; entry LED: SDR preferred at low bitrates, crossover at ≈6 Mbps) (Ebenezer et al., 2023). Content-dependence is salient: synthetic/gameplay content and rapid view changes challenge feature extraction, yielding systematically lower MOS and SRCC for native and converted HDR.

3. Objective Quality Metrics for SFV+HDR

Objective assessment paradigms bifurcate into no-reference (NR/VQA) and full-reference (FR/IQA/VQA) models. Among NR engines, deep feature-based methods such as DOVER, FAST-VQA, and FasterVQA retain moderate correlation in SFV+HDR (native SDR PLCC ≈0.75–0.80) but degrade on HDR2SDR (PLCC ↓ by ≈0.13–0.17) and are especially challenged by content-adaptive distortions (Wang et al., 2024).

HDR-targeted NR models improve performance by introducing bit-depth and dynamic range sensitivity:

HDRMAX augments classic natural video statistic pipelines (BRISQUE, TLVQM, ChipQA, VBLIINDS) with 18 (static) or 72 (spatio-temporal) features derived from locally normalized, double-exponential luminance mappings, yielding median SRCC improvement of ~+20–57% on HDR databases (Ebenezer et al., 2023).
HDRPatchMAX fuses four feature groups—NIQE, PatchMAX contrast segmentation, HDRMAX, space-time gradient chips—yielding up to SRCC≈0.86 on multi-display subjective studies and outperforming all prior NR models in stability and cross-device generalization (Ebenezer et al., 2023).
HIDRO-VQA employs a ResNet-50 base encoder, self-supervised contrastive fine-tuning on unlabeled HDR clips, and a linear SVR regressor; attaining SROCC ≥0.88 on LIVE-HDR and outperforming classical VQA (Saini et al., 2023).

In the FR setting, classical metrics (PSNR, SSIM, MS-SSIM, VMAF, SpEED, STRRED, STGREED) offer limited HDR correlation (e.g., VMAF SRCC ≈0.56), largely due to 8-bit input constraints and failure to model PQ or display EOTFs (Ebenezer et al., 2023). Notable FR advances include:

FUNQUE+, a wavelet-based regression using MS-ESSIM, DLM, SRRED/TRRED, Edge features and HDRMAX side-channels, exceeding deep VMAF+HDRMAX2 and HIDRO-FR with only seven optimized features, operating at ~12 fps on 4K (Venkataramanan et al., 2023).
AIC-HDR2025 benchmarks converge on HDR-VDP-2 (PLCC=0.936, SROCC=–0.946) as SOTA, with Butteraugli-pnorm and SSIMULACRA2 as strong alternatives. Tone mapping enhances SDR metric performance on HDR content (Jenadeleh et al., 14 Jun 2025).

4. Algorithmic and Statistical Best Practices

Key methodological recommendations have emerged:

SFV+HDR assessment should employ JND-based sub-threshold scaling protocols (e.g., AIC-3) rather than mean opinion or ACR/DSIS scores, critical for high-fidelity/visually lossless regimes (Jenadeleh et al., 14 Jun 2025).
Dual-method subjective protocols (plain triplet and flicker-boosted) increase sensitivity to subtle artifact signatures.
NR VQA models should incorporate exponential HDRMAX features and contrast-segmented statistics to boost sensitivity to both bright/dark distortion regimes; retrain or fine-tune on HDR2SDR and category-balanced subsets (Ebenezer et al., 2023, Ebenezer et al., 2023).
FR models should employ perceptually uniform preprocessing (e.g., PU21 mapping/qHDRMAX) and wavelet decomposition if targeting computational efficiency and cross-database robustness (Venkataramanan et al., 2023).
For SFV (segment-level assessment), temporal pooling strategies may require adaptation (mean-based or content-adaptive) due to heightened temporal variability over very short clips (Wang et al., 2024).

5. Compression, Codec, and Streaming Implications

Compression studies on SFV+HDR show pronounced differences in coding efficiency and rate–distortion characteristics relative to SDR:

In objective-only gaming HDR studies, AV1 consistently surpasses HEVC, H.264, and VP9 in compression efficiency, with average BD-BR savings of –6.33% (vs. VP9) and –1.58% (vs. HEVC) at 4K, 10-bit PQ (Barman et al., 2021).
Rate-control artifacts (e.g., I-frame dips in AV1 under strict CBR) and per-frame PSNR variation inform practical recommendations for bitrate ladder and encoder selection.
Optimal bitrate thresholds for HDR/SDR crossovers are display-dependent, with OLEDs supporting HDR at lower rates and entry-level LEDs suggesting tiered representations (Ebenezer et al., 2023).

6. Emerging Techniques and Open Problems

Large multimodal models such as HDR-Q extend VQA for SFV+HDR by leveraging HDR-aware contrastive pretraining, explicit RL-based policy optimization (HAPO), and per-token grounding to HDR-specific content, attaining SRCC≥0.92 on Beyond8Bits and superior zero-shot transfer (Saini et al., 1 Mar 2026). Content-adaptive regression heads, per-category error analysis, and explicit pooling strategies to model perceptual masking or JND scaling represent future frontiers.

Limitations persist in cross-lab reproducibility, HDR display calibration, and lack of true HDR model training resources. Not all models adequately consider the influence of tone-mapping, color spaces, and meta-data signaling on perceived fidelity. Display technology, ambient lighting, and user device heterogeneity remain active areas of research.

7. Summary Table: Metric Performance in SFV+HDR

Metric/Model	Context	Median SRCC	Comments
HDRPatchMAX	NR, multi-TV	0.859	Robust to device/dynamic range
HIDRO-VQA	NR, HDR-only	0.879	Contrastive HDR fine-tuning
3C-FUNQUE+	FR, LIVE-HDR	0.898–0.902	SOTA among full-ref
FAST-VQA	NR, SFV/UGC	0.75 (SDR)	Drops to ≈0.54 (HDR2SDR)
VMAF	FR	0.56	Weak on HDR/SDR toggling
HDR-VDP-2	FR, IQA	0.94	Highest PLCC/AIC-HDR2025

All statistical values follow source-reported content splits (Ebenezer et al., 2023, Wang et al., 2024, Venkataramanan et al., 2023, Saini et al., 2023, Jenadeleh et al., 14 Jun 2025).

In conclusion, SFV+HDR quality assessment is rapidly converging on HDR-specific, dynamic range-aware VQA/IQA models and datasets. Sub-JND resolution in subjective protocols, nonlinearity-boosted statistics, deep self-supervised feature transfer, and hybrid regression pipelines define the new methodological baseline. Continued research should address temporal pooling for very short videos, content-adaptive modeling, encoder/decoder complexity, display/capture device idiosyncrasies, and the generalization of these pipelines across real-world deployments.