GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction (2403.17837v1)
Abstract: High Dynamic Range (HDR) content (i.e., images and videos) has a broad range of applications. However, capturing HDR content from real-world scenes is expensive and time-consuming. Therefore, the challenging task of reconstructing visually accurate HDR images from their Low Dynamic Range (LDR) counterparts is gaining attention in the vision research community. A major challenge in this research problem is the lack of datasets, which capture diverse scene conditions (e.g., lighting, shadows, weather, locations, landscapes, objects, humans, buildings) and various image features (e.g., color, contrast, saturation, hue, luminance, brightness, radiance). To address this gap, in this paper, we introduce GTA-HDR, a large-scale synthetic dataset of photo-realistic HDR images sampled from the GTA-V video game. We perform thorough evaluation of the proposed dataset, which demonstrates significant qualitative and quantitative improvements of the state-of-the-art HDR image reconstruction methods. Furthermore, we demonstrate the effectiveness of the proposed dataset and its impact on additional computer vision tasks including 3D human pose estimation, human body part segmentation, and holistic scene segmentation. The dataset, data collection pipeline, and evaluation code are available at: https://github.com/HrishavBakulBarua/GTA-HDR.
- A. Artusi, R. K. Mantiuk, T. Richter, P. Hanhart, P. Korshunov, M. Agostinelli, A. Ten, and T. Ebrahimi, “Overview and evaluation of the JPEG XT HDR image compression standard,” Journal of Real-Time Image Processing, vol. 16, pp. 413–428, 2019.
- G. He, K. Xu, L. Xu, C. Wu, M. Sun, X. Wen, and Y.-W. Tai, “SDRTV-to-HDRTV via Hierarchical Dynamic Context Feature Mapping,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 2890–2898.
- P. Satilmis and T. Bashford-Rogers, “Deep Dynamic Cloud Lighting,” arXiv preprint arXiv:2304.09317, 2023.
- X. Huang, Q. Zhang, Y. Feng, H. Li, X. Wang, and Q. Wang, “HDR-NeRF: High Dynamic Range Neural Radiance Fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18 398–18 408.
- H. Nguyen, D. Tran, K. Nguyen, and R. Nguyen, “PSENet: Progressive Self-Enhancement Network for Unsupervised Extreme-Light Image Enhancement,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 1756–1765.
- X. Wu, H. Zhang, X. Hu, M. Shakeri, C. Fan, and J. Ting, “HDR reconstruction based on the polarization camera,” IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 5113–5119, 2020.
- L. Wang and K.-J. Yoon, “Deep Learning for HDR Imaging: State-of-the-Art and Future Trends,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 12, pp. 8874–8895, 2021.
- G. Tiwari and P. Rani, “A Review On High-Dynamic-Range Imaging With Its Technique,” International Journal of Signal Processing, Image Processing and Pattern Recognition, vol. 8, no. 9, pp. 93–100, 2015.
- O. T. Tursun, A. O. Akyüz, A. Erdem, and E. Erdem, “The State of the Art in HDR Deghosting: A Survey and Evaluation,” in Computer Graphics Forum, vol. 34, no. 2. Wiley Online Library, 2015, pp. 683–707.
- A. K. Johnson, “High Dynamic Range Imaging - A Review,” Int. J. Image Process.(IJIP), vol. 9, p. 198, 2015.
- H. Wang, M. Ye, X. Zhu, S. Li, C. Zhu, and X. Li, “KUNet: Imaging Knowledge-Inspired Single HDR Image Reconstruction,” in The 31st International Joint Conference On Artificial Intelligence (IJCAI/ECAI 22), 2022.
- Y. Kinoshita and H. Kiya, “Scene segmentation-based luminance adjustment for multi-exposure image fusion,” IEEE Transactions on Image Processing, vol. 28, no. 8, pp. 4101–4116, 2019.
- G. Luzardo, J. Aelterman, H. Luong, W. Philips, D. Ochoa, and S. Rousseaux, “Fully-Automatic Inverse Tone Mapping Preserving the Content Creator’s Artistic Intentions,” in 2018 Picture Coding Symposium (PCS). IEEE, 2018, pp. 199–203.
- R. P. Kovaleski and M. M. Oliveira, “High-Quality Reverse Tone Mapping for a Wide Range of Exposures,” in 2014 27th SIBGRAPI Conference on Graphics, Patterns and Images. IEEE, 2014, pp. 49–56.
- Y. Huo, F. Yang, L. Dong, and V. Brost, “Physiological inverse tone mapping based on retina response,” The Visual Computer, vol. 30, pp. 507–517, 2014.
- B. Masia, A. Serrano, and D. Gutierrez, “Dynamic range expansion based on image statistics,” Multimedia Tools and Applications, vol. 76, pp. 631–648, 2017.
- B.-C. Guo and C.-H. Lin, “Single-Image HDR Reconstruction Based on Two-Stage GAN Structure,” in 2023 IEEE International Conference on Image Processing (ICIP). IEEE, 2023, pp. 91–95.
- D. Dalal, G. Vashishtha, P. Singh, and S. Raman, “Single Image LDR to HDR Conversion Using Conditional Diffusion,” in 2023 IEEE International Conference on Image Processing (ICIP). IEEE, 2023, pp. 3533–3537.
- N. K. Kalantari, R. Ramamoorthi et al., “Deep High Dynamic Range Imaging of Dynamic Scenes,” ACM Trans. Graph., vol. 36, no. 4, pp. 144–1, 2017.
- Y. Endo, Y. Kanamori, and J. Mitani, “Deep reverse tone mapping,” ACM Trans. Graph., vol. 36, no. 6, pp. 177–1, 2017.
- G. Eilertsen, J. Kronander, G. Denes, R. K. Mantiuk, and J. Unger, “HDR image reconstruction from a single exposure using deep CNNs,” ACM transactions on graphics (TOG), vol. 36, no. 6, pp. 1–15, 2017.
- H. Nemoto, P. Korshunov, P. Hanhart, and T. Ebrahimi, “Visual attention in LDR and HDR images,” in 9th International Workshop on Video Processing and Quality Metrics for Consumer Electronics (VPQM), no. CONF, 2015.
- S. Lee, G. H. An, and S.-J. Kang, “Deep Chain HDRI: Reconstructing a High Dynamic Range Image from a Single Low Dynamic Range Image,” IEEE Access, vol. 6, pp. 49 913–49 924, 2018.
- K. R. Prabhakar, R. Arora, A. Swaminathan, K. P. Singh, and R. V. Babu, “A Fast, Scalable, and Reliable Deghosting Method for Extreme Exposure Fusion,” in 2019 IEEE International Conference on Computational Photography (ICCP). IEEE, 2019, pp. 1–8.
- H. Jang, K. Bang, J. Jang, and D. Hwang, “Dynamic Range Expansion Using Cumulative Histogram Learning for High Dynamic Range Image Generation,” IEEE Access, vol. 8, pp. 38 554–38 567, 2020.
- J. Zhang and J.-F. Lalonde, “Learning High Dynamic Range from Outdoor Panoramas,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 4519–4528.
- L. et al., “The Laval HDR sky database,” http://hdrdb.com/, 2016, [Online; accessed 3-July-2023].
- Y.-L. Liu, W.-S. Lai, Y.-S. Chen, Y.-L. Kao, M.-H. Yang, Y.-Y. Chuang, and J.-B. Huang, “Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1651–1660.
- J. Cai, S. Gu, and L. Zhang, “Learning a Deep Single Image Contrast Enhancer from Multi-Exposure Images,” IEEE Transactions on Image Processing, vol. 27, no. 4, pp. 2049–2062, 2018.
- S. Y. Kim, J. Oh, and M. Kim, “Deep SR-ITM: Joint Learning of Super-Resolution and Inverse Tone-Mapping for 4K UHD HDR Applications,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 3116–3125.
- P. Sen, N. K. Kalantari, M. Yaesoubi, S. Darabi, D. B. Goldman, and E. Shechtman, “Robust Patch-Based HDR Reconstruction of Dynamic Scenes,” ACM Trans. Graph., vol. 31, no. 6, pp. 203–1, 2012.
- O. T. Tursun, A. O. Akyüz, A. Erdem, and E. Erdem, “An Objective Deghosting Quality Metric for HDR Images,” in Computer Graphics Forum, vol. 35, no. 2. Wiley Online Library, 2016, pp. 139–152.
- D.-T. Dang-Nguyen, C. Pasquini, V. Conotter, and G. Boato, “RAISE: A Raw Images Dataset for Digital Image Forensics,” in Proceedings of the 6th ACM multimedia systems conference, 2015, pp. 219–224.
- F. Banterle, A. Artusi, A. Moreo, and F. Carrara, “Nor-Vdpnet: A No-Reference High Dynamic Range Quality Metric Trained On Hdr-Vdp 2,” in 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020, pp. 126–130.
- F. Banterle, A. Artusi, A. Moreo, F. Carrara, and P. Cignoni, “NoR-VDPNet++: Real-Time No-Reference Image Quality Metrics,” IEEE Access, vol. 11, pp. 34 544–34 553, 2023.
- A. Artusi, F. Banterle, F. Carra, and A. Moreno, “Efficient Evaluation of Image Quality via Deep-Learning Approximation of Perceptual Metrics,” IEEE Transactions on Image Processing, vol. 29, pp. 1843–1855, 2019.
- L. Zhang, A. Zhu, S. Zhao, and Y. Zhou, “Simulation of Atmospheric Visibility Impairment,” IEEE Transactions on Image Processing, vol. 30, pp. 8713–8726, 2021.
- Z. Yang, Z. Cai, H. Mei, S. Liu, Z. Chen, W. Xiao, Y. Wei, Z. Qing, C. Wei, B. Dai et al., “SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling,” arXiv preprint arXiv:2303.17368, 2023.
- M. Angus, M. ElBalkini, S. Khan, A. Harakeh, O. Andrienko, C. Reading, S. Waslander, and K. Czarnecki, “Unlimited Road-scene Synthetic Annotation (URSA) Dataset,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2018, pp. 985–992.
- S. R. Richter, Z. Hayder, and V. Koltun, “Playing for Benchmarks,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2213–2222.
- M. Fabbri, G. Brasó, G. Maugeri, O. Cetintas, R. Gasparini, A. Ošep, S. Calderara, L. Leal-Taixé, and R. Cucchiara, “MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 849–10 859.
- Y.-T. Hu, J. Wang, R. A. Yeh, and A. G. Schwing, “SAIL-VOS 3D: A Synthetic Dataset and Baselines for Object Detection and 3D Mesh Reconstruction from Video Data,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1418–1428.
- P. Krähenbühl, “Free Supervision From Video Games,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2955–2964.
- S. R. Richter, V. Vineet, S. Roth, and V. Koltun, “Playing for Data: Ground Truth from Computer Games,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer, 2016, pp. 102–118.
- C. P. RED, “The Witcher 3: Wild Hunt,” https://www.thewitcher.com/in/en/witcher3, 2016, [Online; accessed 3-Dec-2023].
- H. Xu, Y. Gao, F. Yu, and T. Darrell, “End-to-end Learning of Driving Models from Large-scale Video Datasets,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2174–2182.
- P. Hanji, R. Mantiuk, G. Eilertsen, S. Hajisharif, and J. Unger, “Comparison of single image HDR reconstruction methods—the caveats of quality assessment,” in ACM SIGGRAPH 2022 conference proceedings, 2022, pp. 1–8.
- ——, “SI-HDR-dataset for comparison of single-image high dynamic range reconstruction methods,” University of Cambridge, 2022.
- X. Han, I. R. Khan, and S. Rahardja, “High Dynamic Range Image Tone Mapping: Literature review and performance benchmark,” Digital Signal Processing, p. 104015, 2023.
- P.-H. Le, Q. Le, R. Nguyen, and B.-S. Hua, “Single-Image HDR Reconstruction by Multi-Exposure Generation,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 4063–4072.
- C. Bist, R. Cozot, G. Madec, and X. Ducloux, “Tone expansion using lighting style aesthetics,” Computers & Graphics, vol. 62, pp. 77–86, 2017.
- Z. Khan, M. Khanna, and S. Raman, “FHDR: HDR Image Reconstruction from a Single LDR Image using Feedback Network,” in 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE, 2019, pp. 1–5.
- H. B. Barua, G. Krishnasamy, K. Wong, K. Stefanov, and A. Dhall, “ArtHDR-Net: Perceptually Realistic and Accurate HDR Content Creation,” in 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2023, pp. 806–812.
- J. Li and P. Fang, “HDRNET: Single-Image-based HDR Reconstruction Using Channel Attention CNN,” in Proceedings of the 2019 4th International Conference on Multimedia Systems and Signal Processing, 2019, pp. 119–124.
- M. S. Santos, T. I. Ren, and N. K. Kalantari, “Single Image HDR Reconstruction Using a CNN with Masked Features and Perceptual Loss,” ACM Transactions on Graphics (TOG), vol. 39, no. 4, pp. 80–1, 2020.
- G. Luzardo, J. Aelterman, H. Luong, S. Rousseaux, D. Ochoa, and W. Philips, “Fully-automatic inverse tone mapping algorithm based on dynamic mid-level tone mapping,” APSIPA Transactions on Signal and Information Processing, vol. 9, p. e7, 2020.
- G. Cao, F. Zhou, K. Liu, A. Wang, and L. Fan, “A Decoupled Kernel Prediction Network Guided by Soft Mask for Single Image HDR Reconstruction,” ACM Transactions on Multimedia Computing, Communications and Applications, vol. 19, no. 2s, pp. 1–23, 2023.
- B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis,” Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021.
- Y. Yang, J. Han, J. Liang, I. Sato, and B. Shi, “Learning Event Guided High Dynamic Range Video Reconstruction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 924–13 934.
- B. Mildenhall, P. Hedman, R. Martin-Brualla, P. P. Srinivasan, and J. T. Barron, “NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16 190–16 199.
- P. Raipurkar, R. Pal, and S. Raman, “HDR-cGAN: Single LDR to HDR Image Translation using Conditional GAN,” in Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing, 2021, pp. 1–9.
- S. Shin, K. Kong, and W.-J. Song, “CNN-based LDR-to-HDR conversion system,” in 2018 IEEE International Conference on Consumer Electronics (ICCE). IEEE, 2018, pp. 1–2.
- T. Alotaibi, I. R. Khan, and F. Bourennani, “Quality Assessment of Tone-mapped Images Using Fundamental Color and Structural Features,” IEEE Transactions on Multimedia, 2023.
- M. Narwaria, R. K. Mantiuk, M. P. Da Silva, and P. Le Callet, “HDR-VDP-2.2: A calibrated method for objective quality prediction of high dynamic range and standard images,” Journal of Electronic Imaging, vol. 24, no. 1, pp. 010 501–010 501, 2015.
- B. Yan, B. Bare, and W. Tan, “Naturalness-aware deep no-reference image quality assessment,” IEEE Transactions on Multimedia, vol. 21, no. 10, pp. 2603–2615, 2019.
- C. S. Ravuri, R. Sureddi, S. V. R. Dendi, S. Raman, and S. S. Channappayya, “Deep no-reference tone mapped image quality assessment,” in 2019 53rd Asilomar Conference on Signals, Systems, and Computers. IEEE, 2019, pp. 1906–1910.
- G. Kordopatis-Zilos, S. Papadopoulos, I. Patras, and I. Kompatsiaris, “ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.
- Y. Niu, J. Wu, W. Liu, W. Guo, and R. W. Lau, “HDR-GAN: HDR Image Reconstruction from Multi-Exposed LDR Images with Large Motions,” IEEE Transactions on Image Processing, vol. 30, pp. 3885–3896, 2021.
- H. B. Barua, G. Krishnasamy, K. Wong, A. Dhall, and K. Stefanov, “HistoHDR-Net: Histogram Equalization for Single LDR to HDR Image Translation,” arXiv preprint arXiv:2402.06692, 2024.
- HDRsoft, “Photomatix,” https://www.hdrsoft.com/, [Online; accessed 3-Nov-2023].
- E. Reinhard, M. Stark, P. Shirley, and J. Ferwerda, “Photographic tone reproduction for digital images,” in Seminal Graphics Papers: Pushing the Boundaries, Volume 2, 2023, pp. 661–670.
- R. Mantiuk, K. J. Kim, A. G. Rempel, and W. Heidrich, “HDR-VDP-2: A calibrated visual metric for visibility and quality predictions in all luminance conditions,” ACM Transactions on graphics (TOG), vol. 30, no. 4, pp. 1–14, 2011.
- Z. Wang and A. C. Bovik, “Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures,” IEEE signal processing magazine, vol. 26, no. 1, pp. 98–117, 2009.
- Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
- Z. Wang, L. Lu, and A. C. Bovik, “Video quality assessment based on structural distortion measurement,” Signal processing: Image communication, vol. 19, no. 2, pp. 121–132, 2004.
- P. Gupta, P. Srivastava, S. Bhardwaj, and V. Bhateja, “A modified PSNR metric based on HVS for quality assessment of color images,” in 2011 International Conference on Communication and Industrial Application. IEEE, 2011, pp. 1–4.
- L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold approximation and projection for dimension reduction,” arXiv preprint arXiv:1802.03426, 2018.
- A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
- Y. Sun, W. Liu, Q. Bao, Y. Fu, T. Mei, and M. J. Black, “Putting People in their Place: Monocular Regression of 3D People in Depth,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13 243–13 252.
- P. Patel, C.-H. P. Huang, J. Tesch, D. T. Hoffmann, S. Tripathi, and M. J. Black, “AGORA: Avatars in geography optimized for regression analysis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13 468–13 478.
- R. A. Güler, N. Neverova, and I. Kokkinos, “Densepose: Dense human pose estimation in the wild,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7297–7306.
- K. Lin, L. Wang, K. Luo, Y. Chen, Z. Liu, and M.-T. Sun, “Cross-domain complementary learning using pose for multi-person part segmentation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 3, pp. 1066–1078, 2020.
- M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The Cityscapes Dataset for Semantic Urban Scene Understanding,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213–3223.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment anything,” arXiv preprint arXiv:2304.02643, 2023.