THQA: A Perceptual Quality Assessment Database for Talking Heads
Abstract: In the realm of media technology, digital humans have gained prominence due to rapid advancements in computer technology. However, the manual modeling and control required for the majority of digital humans pose significant obstacles to efficient development. The speech-driven methods offer a novel avenue for manipulating the mouth shape and expressions of digital humans. Despite the proliferation of driving methods, the quality of many generated talking head (TH) videos remains a concern, impacting user visual experiences. To tackle this issue, this paper introduces the Talking Head Quality Assessment (THQA) database, featuring 800 TH videos generated through 8 diverse speech-driven methods. Extensive experiments affirm the THQA database's richness in character and speech features. Subsequent subjective quality assessment experiments analyze correlations between scoring results and speech-driven methods, ages, and genders. In addition, experimental results show that mainstream image and video quality assessment methods have limitations for the THQA database, underscoring the imperative for further research to enhance TH video quality assessment. The THQA database is publicly accessible at https://github.com/zyj-2000/THQA.
- “Perceptual quality assessment for digital human heads,” in ICASSP. IEEE, 2023, pp. 1–5.
- “Ddh-qa: A dynamic digital humans quality assessment database,” in ICME. IEEE, 2023, pp. 2519–2524.
- “Advancing zero-shot digital human quality assessment through text-prompted evaluation,” arXiv preprint arXiv:2307.02808, 2023.
- “Quality-of-experience evaluation for digital twins in 6g network environments,” IEEE Transactions on Broadcasting, 2024.
- “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
- “A no-reference image blur metric based on the cumulative probability of blur detection (cpbd),” IEEE TIP, vol. 20, no. 9, pp. 2678–2683, 2011.
- “Subjective and objective quality assessment for in-the-wild computer graphics images,” ACM Transactions on Multimedia Computing, Communications and Applications, vol. 20, no. 4, pp. 1–22, 2023.
- “An implementation of multimodal fusion system for intelligent digital human generation,” arXiv preprint arXiv:2310.20251, 2023.
- “A style-based generator architecture for generative adversarial networks,” in IEEE/CVF CVPR, 2019, pp. 4401–4410.
- “Synthesizing obama: learning lip sync from audio,” ACM ToG, vol. 36, no. 4, pp. 1–13, 2017.
- “Sadtalker: Learning realistic 3d motion coefficients for stylized audio-driven single image talking face animation,” in IEEE/CVF CVPR, 2023, pp. 8652–8661.
- “Makelttalk: speaker-aware talking-head animation,” ACM TOG, vol. 39, no. 6, pp. 1–15, 2020.
- “Audio2head: Audio-driven one-shot talking-head generation with natural head motion,” arXiv preprint arXiv:2107.09293, 2021.
- “Identity-preserving talking face generation with landmark and appearance priors,” in IEEE/CVF CVPR, June 2023, pp. 9729–9738.
- “Dreamtalk: When expressive talking head generation meets diffusion probabilistic models,” arXiv preprint arXiv:2312.09767, 2023.
- “A lip sync expert is all you need for speech to lip generation in the wild,” in Proceedings of the 28th ACM international conference on multimedia, 2020, pp. 484–492.
- “Videoretalking: Audio-based lip synchronization for talking head video editing in the wild,” in SIGGRAPH Asia 2022, 2022, pp. 1–9.
- “Dinet: Deformation inpainting network for realistic face visually dubbing on high resolution video,” arXiv preprint arXiv:2303.03988, 2023.
- “A no-reference quality assessment method for digital human head,” in ICIP. IEEE, 2023, pp. 36–40.
- “Geometry-aware video quality assessment for dynamic digital human,” in ICIP. IEEE, 2023, pp. 1365–1369.
- “A no-reference quality assessment metric for dynamic 3d digital human,” Displays, vol. 80, pp. 102540, 2023.
- “A reduced-reference quality assessment metric for textured mesh digital humans,” in ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 2965–2969.
- “Stablevqa: A deep no-reference quality assessment model for video stability,” in Proceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 1066–1076.
- “Light-vqa: A multi-dimensional quality assessment model for low-light video enhancement,” in Proceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 1088–1097.
- “Towards open-ended visual quality comparison,” arXiv preprint arXiv:2402.16641, 2024.
- RECOMMENDATION ITU-R BT, “Methodology for the subjective assessment of the quality of television pictures,” International Telecommunication Union, 2002.
- “Agiqa-3k: An open database for ai-generated image quality assessment,” IEEE Transactions on Circuits and Systems for Video Technology, 2023.
- “No-reference image quality assessment in the spatial domain,” IEEE TIP, vol. 21, no. 12, pp. 4695–4708, 2012.
- “Making a “completely blind” image quality analyzer,” IEEE SPL, vol. 20, no. 3, pp. 209–212, 2012.
- “A feature-enriched completely blind image quality evaluator,” IEEE TIP, vol. 24, no. 8, pp. 2579–2591, 2015.
- “A completely blind video integrity oracle,” IEEE TIP, vol. 25, no. 1, pp. 289–300, 2015.
- “Blind prediction of natural video quality,” IEEE TIP, vol. 23, no. 3, pp. 1352–1365, 2014.
- Jari Korhonen, “Two-level approach for no-reference consumer video quality assessment,” IEEE TIP, vol. 28, no. 12, pp. 5923–5938, 2019.
- “Ugc-vqa: Benchmarking blind video quality assessment for user generated content,” IEEE TIP, vol. 30, pp. 4449–4464, 2021.
- “Quality assessment of in-the-wild videos,” in ACM MM, 2019, pp. 2351–2359.
- “Rapique: Rapid and accurate video quality prediction of user generated content,” IEEE DOAJ, vol. 2, pp. 425–440, 2021.
- “A deep learning based no-reference quality assessment model for ugc videos,” in ACM MM, 2022.
- “Fast-vqa: Efficient end-to-end video quality assessment with fragment sampling,” in ECCV. 2022, pp. 538–554, Springer.
- “Blindly assess quality of in-the-wild videos via quality-aware pre-training and motion perception,” IEEE TCSVT, vol. 32, no. 9, pp. 5944–5958, 2022.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.