The Age of Synthetic Realities: Challenges and Opportunities (2306.11503v1)
Abstract: Synthetic realities are digital creations or augmentations that are contextually generated through the use of AI methods, leveraging extensive amounts of data to construct new narratives or realities, regardless of the intent to deceive. In this paper, we delve into the concept of synthetic realities and their implications for Digital Forensics and society at large within the rapidly advancing field of AI. We highlight the crucial need for the development of forensic techniques capable of identifying harmful synthetic creations and distinguishing them from reality. This is especially important in scenarios involving the creation and dissemination of fake news, disinformation, and misinformation. Our focus extends to various forms of media, such as images, videos, audio, and text, as we examine how synthetic realities are crafted and explore approaches to detecting these malicious creations. Additionally, we shed light on the key research challenges that lie ahead in this area. This study is of paramount importance due to the rapid progress of AI generative techniques and their impact on the fundamental principles of Forensic Science.
- J. Franchi, “Digest 95-5,” http://phys.bspu.unibel.by/static/met/guides/eric_it/digests/virtual.html, 1995, (Accessed on 05/16/2023).
- K. W. Wong, “Titans of AI Andrew Ng and Yann LeCun oppose call for pause on powerful AI systems,” https://venturebeat.com/ai/ titans-of-ai-industry-andrew-ng-and-yann-lecun-oppose-call-for- pause-on-powerful-ai-systems/, 2023, (Accessed on 05/16/2023).
- Y. Bengio et al., “Pause giant AI experiments: An open letter,” https://futureoflife.org/open-letter/pause-giant-ai-experiments/ , 2023, (Accessed on 05/16/2023).
- W. J. Chisum and B. Turvey, “Evidence dynamics: Locard’s exchange principle & crime reconstruction,” Journal of Behavioral Profiling, vol. 1, no. 1, pp. 1–15, 2000.
- R. Padilha, A. Theófilo, F. A. Andaló, D. A. Vega-Oliveros, J. P. Cardenuto, G. Bertocco, J. Nascimento, J. Yang, and A. Rocha, “A inteligência artificial e os desafios da ciência forense digital no século XXI,” Estudos Avançados, vol. 35, no. 101, pp. 113–138, 2021.
- A. M. Ferreira, T. Carvalho, F. A. Andaló, and A. Rocha, “Counteracting the contemporaneous proliferation of digital forgeries and fake news,” Anais da Academia Brasileira de Ciências (AABC), vol. 91, no. suppl. 1, p. e20180149, 2019.
- D. Cozzolino and L. Verdoliva, “Noiseprint: a CNN-based camera model fingerprint,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 144–159, 2019.
- S. Agarwal and H. Farid, “Photo forensics from JPEG dimples,” in IEEE International Workshop on Information Forensics and Security, 2017, pp. 1–6.
- P. Korus and J. Huang, “Multi-scale analysis strategies in PRNU-based tampering localization,” IEEE Transactions on Information Forensics and Security, vol. 12, no. 4, pp. 809–824, 2016.
- D. D’Avino, D. Cozzolino, G. Poggi, and L. Verdoliva, “Autoencoder with recurrent neural networks for video forgery detection,” IS&T Electronic Imaging (EI), vol. 29, no. 7, pp. 92–92, 2017.
- T. Bianchi and A. Piva, “Image forgery localization via block-grained analysis of JPEG artifacts,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 3, pp. 1003–1017, 2012.
- M. Darvish Morshedi Hosseini and M. Goljan, “Camera identification from HDR images,” in ACM Workshop on Information Hiding and Multimedia Security, 2019, pp. 69–76.
- L. Bondi, L. Baroffio, D. Güera, P. Bestagini, E. J. Delp, and S. Tubaro, “First steps toward camera model identification with convolutional neural networks,” IEEE Signal Processing Letters, vol. 24, no. 3, pp. 259–263, 2016.
- A. Tuama, F. Comby, and M. Chaumont, “Camera model identification with the use of deep convolutional neural networks,” in IEEE International Workshop on Information Forensics and Security, 2016, pp. 1–6.
- S. Bayram, H. T. Sencar, and N. Memon, “Sensor fingerprint identification through composite fingerprints and group testing,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 3, pp. 597–612, 2014.
- M. Chen, J. Fridrich, M. Goljan, and J. Lukás, “Determining image origin and integrity using sensor noise,” IEEE Transactions on Information Forensics and Security, vol. 3, no. 1, pp. 74–90, 2008.
- A. Bharati, D. Moreira, J. Brogan, P. Hale, K. Bowyer, P. Flynn, A. Rocha, and W. Scheirer, “Beyond pixels: Image provenance analysis leveraging metadata,” in IEEE Winter Conference on Applications of Computer Vision, 2019, pp. 1692–1702.
- D. Moreira, A. Bharati, J. Brogan, A. Pinto, M. Parowski, K. W. Bowyer, P. J. Flynn, A. Rocha, and W. J. Scheirer, “Image provenance analysis at scale,” IEEE Transactions on Image Processing, vol. 27, no. 12, pp. 6109–6123, 2018.
- Y. Liu, X. Zhu, X. Zhao, and Y. Cao, “Adversarial learning for constrained image splicing detection and localization based on atrous convolution,” IEEE Transactions on Information Forensics and Security, vol. 14, no. 10, pp. 2551–2566, 2019.
- M. den Dunnen, “Synthetic reality & deep fakes impact on police work,” https://enlets.eu/wp-content/uploads/2021/11/Final-Synthetic-Reality-Deep-fakes-Impact-on-Police-Work-04.11.21 .pdf, 2021, (Accessed on 05/16/2023).
- J. Kell, “How A.I. is reshaping the way movies are made,” https://fortune.com/2023/02/14/tech-forward-everyday-ai -hollywood-movies/, 2023, (Accessed on 06/05/2023).
- J. Traylor, “AI-generated synthetic media is starting to permeate the Internet,” https://www.nbcnews.com/tech/tech-news/ai-generated-synthetic-media-future-content-rcna72958, 2023, (Accessed on 06/05/2023).
- M. Humphries, “Nvidia uses AI to make our eyes always look at the camera,” https://uk.pcmag.com/video-conferencing-software/144893/nvidia-uses-ai-to-make-our-eyes-always-look -at-the-camera, 2023, (Accessed on 05/16/2023).
- S. Ruberg, “Backlash against AI supermodels triggers wider fears in fashion workforce,” https://www.nbcnews.com/business/business-news/ai-models-levis-controversy-backlash -rcna77280, 2023, (Accessed on 05/16/2023).
- L. Mearian, “Schools look to ban ChatGPT, students use it anyway,” https://www.computerworld.com/article/3694195/schools -look-to-ban-chatgpt-students-use-it-anyway.html, 2023, (Accessed on 05/16/2023).
- E. Morris, “Photography as a weapon,” https://archive.nytimes.com/ opinionator.blogs.nytimes.com/2008/08/11/photography-as-a -weapon/, 2008, (Accessed on 06/05/2023).
- L. Ulanof, “Photoshop AI generative fill is so powerful it might change photo editing forever,” https://www.techradar.com/features/ photoshop-ai-generative-fill-is-so-powerful-it-might-change-photo-editing-forever, 2023, (Accessed on 06/05/2023).
- M. Ruggier, “Whitney Houston died 10 year ago, vegas hologram show captures legacy,” https://eu.usatoday.com/story/entertainment /music/2022/02/11/whitney-houston-died-10-year-ago-las-vegas-ho logram-show-captures-legacy/6669768001/, 2022, (Accessed on 06/05/2023).
- A. Bohr and K. Memarzadeh, “The rise of artificial intelligence in healthcare applications,” in Artificial Intelligence in Healthcare. Elsevier, 2020, pp. 25–60.
- C. Martinez, “Artificial intelligence and accessibility: Examples of a technology that serves people with disabilities,” https://www.inclusivecitymaker.com/artificial-intelligence- accessibility-examples-technology-serves-people-disabilities/, 2021, (Accessed on 06/05/2023).
- D. O’Sullivan, “Nonconsensual deepfake porn puts AI in spotlight,” https://edition.cnn.com/2023/02/16/tech/nonconsensual- deepfake-porn/index.html, 2023, (Accessed on 06/05/2023).
- C. Xiang, “People are creating records of fake historical events using AI,” https://www.vice.com/en/article/k7zqdw/people-are-creating-records-of-fake-historical-events-using-ai, 2023, (Accessed on 06/05/2023).
- J. Vainilavičius, “Deepfakes could facilitate real estate fraud, experts warn,” https://cybernews.com/security/deepfakes-could -facilitate-real-estate-fraud/, 2022, (Accessed on 06/05/2023).
- C. Qi, J. Zhang, and P. Luo, “Emerging concern of scientific fraud: Deep learning and image manipulation,” bioRxiv, 2021.
- J. Gu, X. Wang, C. Li, J. Zhao, W. Fu, G. Liang, and J. Qiu, “AI-enabled image fraud in scientific publications,” Patterns, vol. 3, no. 7, p. 100511, 2022.
- Y. Mirsky, T. Mahler, I. Shelef, and Y. Elovici, “CT-GAN: Malicious tampering of 3D medical imagery using deep learning,” in USENIX Conference on Security Symposium, 2019, pp. 461–478.
- N. Mangaokar, J. Pu, P. Bhattacharya, C. K. Reddy, and B. Viswanath, “Jekyll: Attacking medical image diagnostics using deep generative models,” in IEEE European Symposium on Security and Privacy (EuroS&P), 2020, pp. 139–157.
- “Midjourney,” https://www.midjourney.com, 2023, (Accessed on 05/22/2023).
- “DreamStudio,” https://dreamstudio.ai/, 2023, (Accessed on 05/22/2023).
- J. Liang, C. Wu, X. Hu, Z. Gan, J. Wang, L. Wang, Z. Liu, Y. Fang, and N. Duan, “NUWA-Infinity: Autoregressive over autoregressive generation for infinite visual synthesis,” Advances in Neural Information Processing Systems, vol. 35, pp. 15 420–15 432, 2022.
- I. Stanley-Becker and J. Wagner, “RNC counters Biden announcement with dystopian, AI-aided video,” https://www.washingtonpost. com/politics/2023/04/25/rnc-biden-ad-ai/, 2023, (Accessed on 06/05/2023).
- F. Zhan, Y. Yu, R. Wu, J. Zhang, S. Lu, L. Liu, A. Kortylewski, C. Theobalt, and E. Xing, “Multimodal image synthesis and editing: A survey,” arXiv preprint, vol. arXiv:2112.13592, 2021.
- “Stable diffusion web UI,” https://github.com/AUTOMATIC1111/stable-diffusion-webui, 2023, (Accessed on 05/22/2023).
- “Stable diffusion GitHub repository,” https://github.com/CompVis/stable-diffusion, 2023, (Accessed on 05/22/2023).
- I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, vol. 27, 2014.
- Z. Pan, W. Yu, X. Yi, A. Khan, F. Yuan, and Y. Zheng, “Recent progress on generative adversarial networks (GANs): A survey,” IEEE Access, vol. 7, pp. 36 322–36 333, 2019.
- A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint, vol. arXiv:1511.06434, 2015.
- H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. Metaxas, “StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks,” in IEEE International Conference on Computer Vision, 2017, pp. 5907–5915.
- T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of GANs for improved quality, stability, and variation,” in International Conference on Learning Representations, 2018.
- B. Zhang, S. Gu, B. Zhang, J. Bao, D. Chen, F. Wen, Y. Wang, and B. Guo, “StyleSwin: Transformer-based GAN for high-resolution image generation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 11 304–11 314.
- T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 4401–4410.
- T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of StyleGAN,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 8110–8119.
- T. Karras, M. Aittala, S. Laine, E. Härkönen, J. Hellsten, J. Lehtinen, and T. Aila, “Alias-free generative adversarial networks,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 852–863.
- A. Sauer, T. Karras, S. Laine, A. Geiger, and T. Aila, “StyleGAN-T: Unlocking the power of GANs for fast large-scale text-to-image synthesis,” in International Conference on Machine Learning, 2023.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” in International Conference on Machine Learning, 2021, pp. 8748–8763.
- Y. Zhou, R. Zhang, C. Chen, C. Li, C. Tensmeyer, T. Yu, J. Gu, J. Xu, and T. Sun, “Towards language-free training for text-to-image generation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 907–17 917.
- M. Tao, B.-K. Bao, H. Tang, and C. Xu, “GALIP: Generative adversarial clips for text-to-image synthesis,” arXiv preprint, vol. arXiv:2301.12959, 2023.
- J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851, 2020.
- F.-A. Croitoru, V. Hondru, R. T. Ionescu, and M. Shah, “Diffusion models in vision: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–20, 2023.
- R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 10 684–10 695.
- C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans, J. Ho, D. J. Fleet, and M. Norouzi, “Photorealistic text-to-image diffusion models with deep language understanding,” in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 36 479–36 494.
- A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with clip latents,” arXiv preprint, vol. arXiv:2204.06125, 2022.
- D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” arXiv preprint, vol. arXiv:1312.6114, 2013.
- A. van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” in International Conference on Neural Information Processing Systems, vol. 30, 2017, pp. 6309–6318.
- A. Razavi, A. Van den Oord, and O. Vinyals, “Generating diverse high-fidelity images with VQ-VAE-2,” in Advances in Neural Information Processing Systems, vol. 32, 2019.
- A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” in International Conference on Machine Learning, 2021, pp. 8821–8831.
- M. Ding, Z. Yang, W. Hong, W. Zheng, C. Zhou, D. Yin, J. Lin, X. Zou, Z. Shao, H. Yang, and J. Tang, “CogView: Mastering text-to-image generation via transformers,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 19 822–19 835.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, vol. 30, 2017.
- A. Borji, “Qualitative failures of image generation models and their application in detecting deepfakes,” arXiv preprint, vol. arXiv:2304.06470, 2023.
- H. Farid, “Lighting (in)consistency of paint by text,” arXiv preprint, vol. arXiv:2207.13744, 2022.
- ——, “Perspective (in)consistency of paint by text,” arXiv preprint, vol. arXiv:2206.14617, 2022.
- M. Bertamini, A. Spooner, and H. Hecht, “Naive optics: Predicting and perceiving reflections in mirrors,” Journal of Experimental Psychology: Human Perception and Performance, vol. 29, no. 5, pp. 982–1002, 2003.
- Y. Ostrovsky, P. Cavanagh, and P. Sinha, “Perceiving illumination inconsistencies in scenes,” Perception, vol. 34, no. 11, pp. 1301–1314, 2005.
- T. J. de Carvalho, C. Riess, E. Angelopoulou, H. Pedrini, and A. de Rezende Rocha, “Exposing digital image forgeries by illumination color classification,” IEEE Transactions on Information Forensics and Security, vol. 8, no. 7, pp. 1182–1194, 2013.
- S. J. Nightingale and H. Farid, “AI-synthesized faces are indistinguishable from real faces and more trustworthy,” Proceedings of the National Academy of Sciences, vol. 119, no. 8, 2022.
- F. Marra, D. Gragnaniello, L. Verdoliva, and G. Poggi, “Do GANs leave artificial fingerprints?” in IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), 2019, pp. 506–511.
- S. Mandelli, D. Cozzolino, E. D. Cannas, J. P. Cardenuto, D. Moreira, P. Bestagini, W. J. Scheirer, A. Rocha, L. Verdoliva, S. Tubaro, and E. J. Delp, “Forensic analysis of synthetically generated western blot images,” IEEE Access, vol. 10, pp. 59 919–59 932, 2022.
- C. Kong, B. Chen, H. Li, S. Wang, A. Rocha, and S. Kwong, “Detect and locate: Exposing face manipulation by semantic-and noise-level telltales,” IEEE Transactions on Information Forensics and Security, vol. 17, pp. 1741–1756, 2022.
- R. Corvi, D. Cozzolino, G. Poggi, K. Nagano, and L. Verdoliva, “Intriguing properties of synthetic images: from generative adversarial networks to diffusion models,” arXiv preprint, vol. arXiv:2304.06408, 2023.
- D. Gragnaniello, D. Cozzolino, F. Marra, G. Poggi, and L. Verdoliva, “Are GAN generated images easy to detect? A critical analysis of the state-of-the-art,” in IEEE International Conference on Multimedia and Expo, 2021, pp. 1–6.
- T. Osakabe, M. Tanaka, Y. Kinoshita, and H. Kiya, “CycleGAN without checkerboard artifacts for counter-forensics of fake-image detection,” in International Workshop on Advanced Imaging Technology (IWAIT), 2021, p. 1176609.
- D. Cozzolino, J. Thies, A. Rössler, M. Nießner, and L. Verdoliva, “SpoC: Spoofing camera fingerprints,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 990–1000.
- X. Luo, R. Zhan, H. Chang, F. Yang, and P. Milanfar, “Distortion agnostic deep watermarking,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 13 548–13 557.
- M. Ahmadi, A. Norouzi, N. Karimi, S. Samavi, and A. Emami, “ReDMark: Framework for residual diffusion watermarking based on deep networks,” Expert Systems with Applications, vol. 146, p. 113157, 2020.
- P. Fernandez, G. Couairon, H. Jégou, M. Douze, and T. Furon, “The stable signature: Rooting watermarks in latent diffusion models,” arXiv preprint, vol. arXiv:2303.15435, 2023.
- M. Ranzato, A. Szlam, J. Bruna, M. Mathieu, R. Collobert, and S. Chopra, “Video (language) modeling: a baseline for generative models of natural videos,” arXiv preprint, vol. arXiv:1412.6604, 2014.
- C. Vondrick, H. Pirsiavash, and A. Torralba, “Generating videos with scene dynamics,” in Advances in Neural Information Processing Systems, vol. 29, 2016.
- S. Tulyakov, M.-Y. Liu, X. Yang, and J. Kautz, “MoCoGAN: Decomposing motion and content for video generation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1526–1535.
- Y. Tian, J. Ren, M. Chai, K. Olszewski, X. Peng, D. N. Metaxas, and S. Tulyakov, “A good image generator is what you need for high-resolution video synthesis,” arXiv preprint, vol. arXiv:2104.15069, 2021.
- W. Yan, Y. Zhang, P. Abbeel, and A. Srinivas, “VideoGPT: Video generation using VQ-VAE and transformers,” arXiv preprint, vol. arXiv:2104.10157, 2021.
- G. Le Moing, J. Ponce, and C. Schmid, “CCVS: context-aware controllable video synthesis,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 14 042–14 055.
- A. Blattmann, R. Rombach, H. Ling, T. Dockhorn, S. W. Kim, S. Fidler, and K. Kreis, “Align your latents: High-resolution video synthesis with latent diffusion models,” arXiv preprint, vol. arXiv:2304.08818, 2023.
- V. Voleti, A. Jolicoeur-Martineau, and C. Pal, “Masked conditional video diffusion for prediction, generation, and interpolation,” arXiv preprint, vol. arXiv:2205.09853, 2022.
- R. Yang, P. Srivastava, and S. Mandt, “Diffusion probabilistic modeling for video generation,” arXiv preprint, vol. arXiv:2203.09481, 2022.
- U. Singer, A. Polyak, T. Hayes, X. Yin, J. An, S. Zhang, Q. Hu, H. Yang, O. Ashual, O. Gafni, D. Parikh, S. Gupta, and Y. Taigman, “Make-a-video: Text-to-video generation without text-video data,” in International Conference on Learning Representations, 2023.
- Y. Li, M. Min, D. Shen, D. Carlson, and L. Carin, “Video generation from text,” in AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
- Y. Pan, Z. Qiu, T. Yao, H. Li, and T. Mei, “To create what you tell: Generating videos from captions,” in ACM International Conference on Multimedia, 2017, pp. 1789–1798.
- J. Ho, W. Chan, C. Saharia, J. Whang, R. Gao, A. Gritsenko, D. P. Kingma, B. Poole, M. Norouzi, D. J. Fleet, and T. Salimans, “Imagen video: High definition video generation with diffusion models,” arXiv preprint, vol. arXiv:2210.02303, 2022.
- J. An, S. Zhang, H. Yang, S. Gupta, J.-B. Huang, J. Luo, and X. Yin, “Latent-shift: Latent diffusion with temporal shift for efficient text-to-video generation,” arXiv preprint, vol. arXiv:2304.08477, 2023.
- P. Esser, J. Chiu, P. Atighehchian, J. Granskog, and A. Germanidis, “Structure and content-guided video synthesis with diffusion models,” arXiv preprint, vol. arXiv:2302.03011, 2023.
- Y. He, T. Yang, Y. Zhang, Y. Shan, and Q. Chen, “Latent video diffusion models for high-fidelity video generation with arbitrary lengths,” arXiv preprint, vol. arXiv:2211.13221, 2022.
- L. Khachatryan, A. Movsisyan, V. Tadevosyan, R. Henschel, Z. Wang, S. Navasardyan, and H. Shi, “Text2Video-Zero: Text-to-image diffusion models are zero-shot video generators,” arXiv preprint, vol. arXiv:2303.13439, 2023.
- M. Saito, E. Matsumoto, and S. Saito, “Temporal generative adversarial nets with singular value clipping,” in IEEE International Conference on Computer Vision, 2017, pp. 2830–2839.
- N. Srivastava, E. Mansimov, and R. Salakhudinov, “Unsupervised learning of video representations using LSTMs,” in International Conference on Machine Learning, 2015, pp. 843–852.
- I. Skorokhodov, S. Tulyakov, and M. Elhoseiny, “StyleGAN-V: A continuous video generator with the price, image quality and perks of StyleGAN2,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 3626–3636.
- S. Yu, J. Tack, S. Mo, H. Kim, J. Kim, J.-W. Ha, and J. Shin, “Generating videos with dynamics-aware implicit generative adversarial networks,” arXiv preprint, vol. arXiv:2202.10571, 2022.
- T. Höppe, A. Mehrjou, S. Bauer, D. Nielsen, and A. Dittadi, “Diffusion models for video prediction and infilling,” arXiv preprint, vol. arXiv:2206.07696, 2022.
- J. Z. Wu, Y. Ge, X. Wang, W. Lei, Y. Gu, W. Hsu, Y. Shan, X. Qie, and M. Z. Shou, “Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation,” arXiv preprint, vol. arXiv:2212.11565, 2022.
- J. Ho, T. Salimans, A. Gritsenko, W. Chan, M. Norouzi, and D. J. Fleet, “Video diffusion models,” arXiv preprint, vol. arXiv:2204.03458, 2022.
- Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3D U-Net: Learning dense volumetric segmentation from sparse annotation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2016, pp. 424–432.
- S. Ge, S. Nah, G. Liu, T. Poon, A. Tao, B. Catanzaro, D. Jacobs, J.-B. Huang, M.-Y. Liu, and Y. Balaji, “Preserve your own correlation: A noise prior for video diffusion models,” arXiv preprint, vol. arXiv:2305.10474, 2023.
- M. Mathieu, C. Couprie, and Y. LeCun, “Deep multi-scale video prediction beyond mean square error,” arXiv preprint, vol. arXiv:1511.05440, 2015.
- T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, G. Liu, A. Tao, J. Kautz, and B. Catanzaro, “Video-to-video synthesis,” arXiv preprint, vol. arXiv:1808.06601, 2018.
- L. Yu, Y. Cheng, K. Sohn, J. Lezama, H. Zhang, H. Chang, A. G. Hauptmann, M.-H. Yang, Y. Hao, I. Essa, and L. Jiang, “MAGVIT: Masked generative video transformer,” in IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 10 459–10 469.
- W. Hong, M. Ding, W. Zheng, X. Liu, and J. Tang, “CogVideo: Large-scale pretraining for text-to-video generation via transformers,” arXiv preprint, vol. arXiv:2205.15868, 2022.
- M. Ding, W. Zheng, W. Hong, and J. Tang, “CogView2: Faster and better text-to-image generation via hierarchical transformers,” arXiv preprint, vol. arXiv:2204.14217, 2022.
- R. Villegas, M. Babaeizadeh, P.-J. Kindermans, H. Moraldo, H. Zhang, M. T. Saffar, S. Castro, J. Kunze, and D. Erhan, “Phenaki: Variable length video generation from open domain textual description,” arXiv preprint, vol. arXiv:2210.02399, 2022.
- D. Zhou, W. Wang, H. Yan, W. Lv, Y. Zhu, and J. Feng, “MagicVideo: Efficient video generation with latent diffusion models,” arXiv preprint, vol. arXiv:2211.11018, 2022.
- S. Yu, K. Sohn, S. Kim, and J. Shin, “Video probabilistic diffusion models in projected latent space,” in IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 18 456–18 466.
- W. Wang, H. Yang, Z. Tuo, H. He, J. Zhu, J. Fu, and J. Liu, “VideoFactory: Swap attention in spatiotemporal diffusions for text-to-video generation,” arXiv preprint, vol. arXiv:2305.10874, 2023.
- D. Zhang, C. Li, F. Lin, D. Zeng, and S. Ge, “Detecting deepfake videos with temporal dropout 3DCNN,” in International Joint Conference on Artificial Intelligence, 2021, pp. 1288–1294.
- I. Amerini and R. Caldelli, “Exploiting prediction error inconsistencies through LSTM-based classifiers to detect deepfake videos,” in ACM Workshop on Information Hiding and Multimedia Security, 2020, pp. 97–102.
- A. Chintha, B. Thai, S. J. Sohrawardi, K. Bhatt, A. Hickerson, M. Wright, and R. Ptucha, “Recurrent convolutional structures for audio spoof and video deepfake detection,” IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 5, pp. 1024–1037, 2020.
- T. Mittal, U. Bhattacharya, R. Chandra, A. Bera, and D. Manocha, “Emotions don’t lie: An audio-visual deepfake detection method using affective cues,” in ACM International Conference on Multimedia, 2020, pp. 2823–2832.
- A. Haliassos, K. Vougioukas, S. Petridis, and M. Pantic, “Lips don’t lie: A generalisable and robust approach to face forgery detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 5039–5049.
- H. Qi, Q. Guo, F. Juefei-Xu, X. Xie, L. Ma, W. Feng, Y. Liu, and J. Zhao, “DeepRhythm: Exposing deepfakes with attentional visual heartbeat rhythms,” in ACM International Conference on Multimedia, 2020, pp. 4318–4327.
- X. Yang, Y. Li, and S. Lyu, “Exposing deep fakes using inconsistent head poses,” in International Conference on Acoustics, Speech and Signal Processing, 2019, pp. 8261–8265.
- Z. Cai, Z. Xiong, H. Xu, P. Wang, W. Li, and Y. Pan, “Generative adversarial networks: A survey toward private and secure applications,” ACM Computing Surveys, vol. 54, no. 6, pp. 1–38, 2021.
- A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “WaveNet: A generative model for raw audio,” arXiv preprint, vol. arXiv:1609.03499, 2016.
- W. Ping, K. Peng, and J. Chen, “ClariNet: Parallel wave generation in end-to-end text-to-speech,” arXiv preprint, vol. arXiv:1807.07281, 2018.
- Y. Ren, Y. Ruan, X. Tan, T. Qin, S. Zhao, Z. Zhao, and T.-Y. Liu, “FastSpeech: Fast, robust and controllable text to speech,” in Advances in Neural Information Processing Systems, vol. 32, 2019.
- N. Li, S. Liu, Y. Liu, S. Zhao, and M. Liu, “Neural speech synthesis with transformer network,” in AAAI Conference on Artificial Intelligence, vol. 33, no. 1, 2019, pp. 6706–6713.
- H. Guo, F. Xie, F. K. Soong, X. Wu, and H. Meng, “A multi-stage multi-codebook VQ-VAE approach to high-performance neural TTS,” arXiv preprint, vol. arXiv:2209.10887, 2022.
- W. Hsu, Y. Zhang, R. J. Weiss, H. Zen, Y. Wu, Y. Wang, Y. Cao, Y. Jia, Z. Chen, J. Shen, P. Nguyen, and R. Pang, “Hierarchical generative modeling for controllable speech synthesis,” arXiv preprint, vol. arXiv:1810.07217, 2018.
- N. Chen, Y. Zhang, H. Zen, R. J. Weiss, M. Norouzi, and W. Chan, “WaveGrad: Estimating gradients for waveform generation,” arXiv preprint, vol. arXiv:2009.00713, 2020.
- Z. Kong, W. Ping, J. Huang, K. Zhao, and B. Catanzaro, “DiffWave: A versatile diffusion model for audio synthesis,” arXiv preprint, vol. arXiv:2009.09761, 2020.
- V. Popov, I. Vovk, V. Gogoryan, T. Sadekova, and M. Kudinov, “Grad-TTS: A diffusion probabilistic model for text-to-speech,” in International Conference on Machine Learning, 2021, pp. 8599–8608.
- M. Jeong, H. Kim, S. J. Cheon, B. J. Choi, and N. S. Kim, “Diff-TTS: A denoising diffusion model for text-to-speech,” arXiv preprint, vol. arXiv:2104.01409, 2021.
- S.-W. Fu, C.-F. Liao, Y. Tsao, and S.-D. Lin, “MetricGAN: Generative adversarial networks based black-box metric scores optimization for speech enhancement,” in International Conference on Machine Learning, 2019, pp. 2031–2041.
- J. Lin, S. Niu, Z. Wei, X. Lan, A. J. Wijngaarden, M. C. Smith, and K.-C. Wang, “Speech enhancement using forked generative adversarial networks with spectral subtraction,” in Interspeech, 2019, pp. 3163–3167.
- J.-M. Lemercier, J. Richter, S. Welker, and T. Gerkmann, “StoRM: A diffusion-based stochastic regeneration model for speech enhancement and dereverberation,” arXiv preprint, vol. arXiv:2212.11851, 2022.
- Y.-J. Lu, Z.-Q. Wang, S. Watanabe, A. Richard, C. Yu, and Y. Tsao, “Conditional diffusion probabilistic model for speech enhancement,” in International Conference on Acoustics, Speech and Signal Processing, 2022, pp. 7402–7406.
- H. Yen, F. G. Germain, G. Wichern, and J. Le Roux, “Cold diffusion for speech enhancement,” in International Conference on Acoustics, Speech and Signal Processing, 2023, pp. 1–5.
- W.-C. Huang, T. Hayashi, Y.-C. Wu, H. Kameoka, and T. Toda, “Voice transformer network: Sequence-to-sequence voice conversion using transformer with text-to-speech pretraining,” arXiv preprint, vol. arXiv:1912.06813, 2019.
- T. Kaneko, H. Kameoka, K. Tanaka, and N. Hojo, “CycleGAN-VC2: Improved cyclegan-based non-parallel voice conversion,” in International Conference on Acoustics, Speech and Signal Processing, 2019, pp. 6820–6824.
- C.-C. Hsu, H.-T. Hwang, Y.-C. Wu, Y. Tsao, and H.-M. Wang, “Voice conversion from unaligned corpora using variational autoencoding Wasserstein generative adversarial networks,” arXiv preprint, vol. arXiv:1704.00849, 2017.
- H. Kameoka, T. Kaneko, K. Tanaka, and N. Hojo, “ACVAE-VC: Non-parallel voice conversion with auxiliary classifier variational autoencoder,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 9, pp. 1432–1443, 2019.
- H.-S. Choi, J. Lee, W. Kim, J. Lee, H. Heo, and K. Lee, “Neural analysis and synthesis: Reconstructing speech from self-supervised representations,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 16 251–16 265.
- A. Richard, D. Markovic, I. D. Gebru, S. Krenn, G. A. Butler, F. Torre, and Y. Sheikh, “Neural synthesis of binaural speech from mono audio,” in International Conference on Learning Representations, 2021.
- Y. Leng, Z. Chen, J. Guo, H. Liu, J. Chen, X. Tan, D. Mandic, L. He, X.-Y. Li, T. Qin, S. Zhao, and T.-Y. Liu, “BinauralGrad: A two-stage conditional diffusion probabilistic model for binaural audio synthesis,” in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 23 689–23 700.
- K. Bhagtani, A. K. S. Yadav, E. R. Bartusiak, Z. Xiang, R. Shao, S. Baireddy, and E. J. Delp, “An overview of recent work in media forensics: Methods and threats,” arXiv preprint, vol. arXiv:2204.12067, 2022.
- J. Yang, R. K. Das, and H. Li, “Extended constant-Q cepstral coefficients for detection of spoofing attacks,” in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2018, pp. 1024–1029.
- M. Todisco, H. Delgado, and N. W. Evans, “A new feature for automatic speaker verification anti-spoofing: Constant Q cepstral coefficients,” in The Speaker and Language Recognition Workshop (Odyssey), 2016, pp. 283–290.
- Z. Wu, T. Kinnunen, N. Evans, J. Yamagishi, C. Hanilçi, M. Sahidullah, and A. Sizov, “ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge,” in Interspeech, 2015.
- A. Mehrish, N. Majumder, R. Bharadwaj, R. Mihalcea, and S. Poria, “A review of deep learning techniques for speech processing,” Information Fusion, p. 101869, 2023.
- A. Fathan, J. Alam, and W. H. Kang, “Mel-spectrogram image-based end-to-end audio deepfake detection under channel-mismatched conditions,” in IEEE International Conference on Multimedia and Expo, 2022, pp. 1–6.
- João Phillipe Cardenuto (3 papers)
- Jing Yang (320 papers)
- Rafael Padilha (6 papers)
- Renjie Wan (27 papers)
- Daniel Moreira (19 papers)
- Haoliang Li (67 papers)
- Shiqi Wang (163 papers)
- Fernanda Andaló (4 papers)
- Anderson Rocha (40 papers)
- Sébastien Marcel (39 papers)