Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Pathway Towards Responsible AI Generated Content (2303.01325v3)

Published 2 Mar 2023 in cs.AI

Abstract: AI Generated Content (AIGC) has received tremendous attention within the past few years, with content generated in the format of image, text, audio, video, etc. Meanwhile, AIGC has become a double-edged sword and recently received much criticism regarding its responsible usage. In this article, we focus on 8 main concerns that may hinder the healthy development and deployment of AIGC in practice, including risks from (1) privacy; (2) bias, toxicity, misinformation; (3) intellectual property (IP); (4) robustness; (5) open source and explanation; (6) technology abuse; (7) consent, credit, and compensation; (8) environment. Additionally, we provide insights into the promising directions for tackling these risks while constructing generative models, enabling AIGC to be used more responsibly to truly benefit society.

An Examination of Responsible AI in AI-Generated Content

The paper "A Pathway Towards Responsible AI Generated Content" by Chen Chen, Jie Fu, and Lingjuan Lyu addresses the multifaceted challenges and risks associated with AI-generated content (AIGC). Experts in machine learning, AI ethics, and data privacy will find the discussion invaluable in navigating the complexities of deploying AIGC responsibly.

The authors open by recognizing the breadth of AIGC's influence, spanning image, text, audio, and video generation technologies powered by foundation models like GPT and CLIP. Their focus shifts to identifying eight primary risks that may hinder the responsible development and deployment of AIGC:

  1. Privacy Concerns: The authors specify the vulnerability of generative models to privacy leaks, emphasizing the replication risks observed in models like Stable Diffusion. They suggest deduplication and differential privacy as potential solutions.
  2. Bias, Toxicity, and Misinformation: There is an acknowledgment of how uncurated datasets reflect societal biases, potentially leading AIGC models to reinforce harmful stereotypes. The paper discusses technological interventions like data filtering and RLHF to mitigate biases and misinformation.
  3. Intellectual Property (IP): The paper raises complex questions about IP rights concerning AI-generated works. The challenges in detecting copyright infringement due to memorization and replication from training datasets are highlighted.
  4. Robustness: Addressing the threat of backdoor attacks in foundation models, the paper calls for methodologies to ensure the integrity of large-scale generative models against such vulnerabilities.
  5. Responsible Open Source and Explanation: The authors scrutinize the transparency issues in models like GPT-4, advocating for responsible open-sourcing practices and comprehensive explanations to improve public trust and accountability.
  6. Limiting Technology Abuse: The paper warns against misuse, exemplified by deepfakes and misinformation generated by AIGC. It stresses the urgency for ethical governance and regulation frameworks.
  7. Consent, Credit, and Compensation: Highlighting the ethical necessity of obtaining data consent, the authors propose compensation structures for data contributors to address equity in benefitting from AI advancements.
  8. Environmental Impact: Considering the immense computational cost of training colossal models like GPT-3, the paper underscores the need for energy-efficient strategies in model design and operation.

Through the comprehensive list of concerns and mitigation strategies, the authors provide researchers with a significant resource to contextualize the ethical deployment of AIGC technologies. The insights offered also serve as a call to action for interdisciplinary collaboration to establish standards and policies that guide AI development ethically.

As we project into the future, the emergence of increasingly sophisticated models will likely necessitate evolving frameworks for ethical, legal, and environmental considerations in AI-generated content. Balancing innovation with responsibility involves constant vigilance, making the articulation of these challenges by the authors critically relevant. It's imperative that ongoing research builds upon such foundational assessments to ensure AI contributes positively to societal progress without compromising ethical standards.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (86)
  1. Musiclm: Generating music from text. arXiv preprint arXiv: Arxiv-2301.11325, 2023.
  2. Alex Albert. Jailbreak chatgpt. https://www.jailbreakchat.com/, 2023.
  3. Elena Alston. What are ai hallucinations and how do you prevent them? https://zapier.com/blog/ai-hallucinations/, 2023.
  4. Romain Beaumont. Clip retrieval system. https://rom1504.github.io/clip-retrieval/, 2022.
  5. Patient and consumer safety risks when using conversational assistants for medical information: an observational study of siri, alexa, and google assistant. Journal of medical Internet research, 20(9):e11510, 2018.
  6. Multimodal datasets: misogyny, pornography, and malignant stereotypes. arXiv preprint arXiv:2110.01963, 2021.
  7. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  8. Matthew Butterick. Github copilot investigation. https://githubcopilotinvestigation.com/, 2022.
  9. Matthew Butterick. Stable diffusion litigation. https://stablediffusionlitigation.com, 2023.
  10. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pages 2633–2650, 2021.
  11. Quantifying memorization across neural language models. arXiv preprint arXiv:2202.07646, 2022.
  12. Adapting pretrained vision-language foundational models to medical imaging domains. arXiv preprint arXiv:2210.04133, 2022.
  13. Donna Dubinsky Charmaine Lai, Subutai Ahmad and Christy Maver. Ai is harming our planet: addressing ai’s staggering energy cost. https://www.numenta.com/blog/2022/05/24/ai-is-harming-our-planet/, 2022.
  14. How to backdoor diffusion models? arXiv preprint arXiv:2212.05400, 2022.
  15. Lavina Daryanani. How to jailbreak chatgpt. https://watcher.guru/news/how-to-jailbreak-chatgpt/, 2023.
  16. Differentially private diffusion models. arXiv preprint arXiv:2210.09929, 2022.
  17. Why is chatgpt making waves in the ai market? https://www.gartner.com/en/newsroom/press-releases/2022-12-08-why-is-chatgpt-making-waves-in-the-ai-market, 2022.
  18. Eu ai act. https://www.nytimes.com/2023/12/08/technology/eu-ai-act-regulation.html, 2023.
  19. When do gans replicate? on the choice of dataset size. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6701–6710, 2021.
  20. Forbes. ai-alliance. https://www.forbes.com/sites/stevemcdowell/2023/12/06/ibm--meta-launch-the-ai-alliance-for-safe-open-ai/?sh=5aeef6d9fb64, 2023.
  21. Protecting intellectual property of language generation apis with lexical watermark. AAAI, 2022.
  22. Cater: Intellectual property protection on text generation apis via conditional watermarks. Advances in Neural Information Processing Systems, 2022.
  23. Melissa Heikkilä. Artists can now opt out of the next version of stable diffusion. https://www.technologyreview.com/2022/12/16/1065247/artists-can-now-opt-out-of-the-next-version-of-stable-diffusion/, 2022.
  24. Melissa Heikkilä. This artist is dominating ai-generated art. and he’s not happy about it. https://www.technologyreview.com/2022/09/16/1059598/this-artist-is-dominating-ai-generated-art-and-hes-not-happy-about-it/, 2022.
  25. Melissa Heikkilä. Ai models spit out photos of real people and copyrighted images. https://www.technologyreview.com/2023/02/03/1067786/ai-models-spit-out-photos-of-real-people-and-copyrighted-images/, 2023.
  26. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
  27. Ghost writer: Microsoft looks to add openai’s chatbot technology to word, email. https://www.theinformation.com/articles/ghost-writer-microsoft-looks-to-add-openais-chatbot-technology-to-word-email, 2023.
  28. David holz, founder of ai art generator midjourney, on the future of imaging. https://www.theregister.com/2022/08/01/david_holz_midjourney/, 2022.
  29. Richi Jennings. Devs: Don’t rely on github copilot — legal risk gets real. https://www.reversinglabs.com/blog/devs-dont-rely-on-github-copilot-legal-risk-is-real, 2022.
  30. Khari Johnson. Dall-e 2 creates incredible images—and biased ones you don’t see. https://www.wired.com/story/dall-e-2-ai-text-image-bias-social-media/, 2022.
  31. Deduplicating training data mitigates privacy risks in language models. arXiv preprint arXiv:2202.06539, 2022.
  32. A watermark for large language models. arXiv preprint arXiv:2301.10226, 2023.
  33. Pitfalls of static language modelling. arXiv preprint arXiv:2102.01951, 2021.
  34. James Lopez. Microsoft, and amazon guard against chatgpt theft, ban employees from sharing sensitive data. https://www.techgoing.com/microsoft-and-amazon-guard-against-chatgpt-theft-ban-employees-from-sharing-sensitive-data/, 2023.
  35. A non-parametric test to detect data-copying in generative models. In International Conference on Artificial Intelligence and Statistics, 2020.
  36. Meta. Llama open source. https://ai.meta.com/llama/, 2023.
  37. Midjourney. Midjourney: Terms of service. https://midjourney.gitbook.io/docs/terms-of-service, 2022.
  38. Kirk Miller. Google admits its mind-blowing text-to-image ai is endlessly problematic. https://www.insidehook.com/daily_brief/tech/google-imagen-text-to-image, 2022.
  39. Chatgpt may be coming for our jobs. here are the 10 roles that ai is most likely to replace. https://www.businessinsider.com/chatgpt-jobs-at-risk-replacement-artificial-intelligence-ai-labor-trends-2023-02, 2023.
  40. Efficient large-scale language model training on gpu clusters using megatron-lm. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–15, 2021.
  41. Alex Nichol. Dall·e 2 pre-training mitigations. https://openai.com/blog/dall-e-2-pre-training-mitigations/, 2022.
  42. U.S. Copyright Office. What is copyright? https://www.copyright.gov/what-is-copyright/, 2023.
  43. OpenAI. Chatgpt: Optimizing language models for dialogue. https://openai.com/blog/chatgpt/, 2022.
  44. OpenAI. Gpt-4. https://openai.com/research/gpt-4, 2023.
  45. Stack Overflow. Temporary policy: Chatgpt is banned. https://meta.stackoverflow.com/questions/421831/temporary-policy-chatgpt-is-banned, 2022.
  46. Asset: Robust backdoor data detection across a multiplicity of deep learning paradigms. arXiv preprint arXiv:2302.11408, 2023.
  47. Are you copying my model? protecting the copyright of large language models for eaas via backdoor watermark. arXiv preprint arXiv:2305.10036, 2023.
  48. Large image datasets: A pyrrhic win for computer vision? arXiv preprint arXiv:2006.16923, 2020.
  49. PromptBase. Promptbase official website. https://promptbase.com, 2022.
  50. Improving language understanding by generative pre-training. 2018.
  51. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  52. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  53. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  54. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  55. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  56. Stable diffusion github repository. https://github.com/CompVis/stable-diffusion, 2022.
  57. Stable diffusion v1 model card. https://github.com/CompVis/stable-diffusion/blob/main/Stable_Diffusion_v1_Model_Card.md, 2022.
  58. Kalhan Rosenblatt. Chatgpt banned from new york city public schools’ devices and networks. https://www.nbcnews.com/tech/tech-news/new-york-city-public-schools-ban-chatgpt-devices-networks-rcna64446, 2022.
  59. Runway. Text to video. https://runwayml.com/text-to-video/, 2022.
  60. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022.
  61. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  62. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114, 2021.
  63. The perils of learning from unlabeled data: Backdoor attacks on semi-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4730–4740, 2023.
  64. Make-a-video: Text-to-video generation without text-video data. arXiv preprint arXiv:2209.14792, 2022.
  65. Diffusion art or digital forgery? investigating data replication in diffusion models. arXiv preprint arXiv:2212.03860, 2022.
  66. Pushing the limits of chatgpt on nlp tasks. arXiv preprint arXiv:2306.09719, 2023.
  67. Defending against backdoor attacks in natural language generation. In AAAI, 2023.
  68. Privacy assessment on reconstructed images: Are existing evaluation metrics faithful to human perception? In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  69. Phenaki: Variable length video generation from open domain textual description. arXiv preprint arXiv:2210.02399, 2022.
  70. James Vincent. Getty images is suing the creators of ai art tool stable diffusion for scraping its content. https://www.theverge.com/2023/1/17/23558516/ai-art-copyright-stable-diffusion-getty-images-lawsuit, 2023.
  71. How to detect unauthorized data usages in text-to-image diffusion models. arXiv preprint arXiv:2307.03108, 2023.
  72. Where did i come from? origin attribution of ai-generated images. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  73. This person (probably) exists. identity membership attacks against gan generated faces. arXiv preprint arXiv:2107.06018, 2021.
  74. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359, 2021.
  75. Challenges in detoxifying language models. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2447–2469, 2021.
  76. Deblurring via stochastic refinement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16293–16303, 2022.
  77. Kyle Wiggers. Deepfakes for all: Uncensored ai art model prompts ethics questions. https://techcrunch.com/2022/08/24/deepfakes-for-all-uncensored-ai-art-model-prompts-ethics-questions/, 2022.
  78. Kyle Wiggers. Image-generating ai can copy and paste from training data, raising ip concerns. https://techcrunch.com/2022/12/13/image-generating-ai-can-copy-and-paste-from-training-data-raising-ip-concerns/, 2022.
  79. Exploring the training data behind stable diffusion. https://simonwillison.net/2022/Sep/5/laion-aesthetics-weeknotes/, 2022.
  80. Robust fine-tuning of zero-shot models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7959–7971, 2022.
  81. Defending chatgpt against jailbreak attack via self-reminders. Nature Machine Intelligence, pages 1–11, 2023.
  82. Detoxifying language models risks marginalizing minority voices. arXiv preprint arXiv:2104.06390, 2021.
  83. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789, 2022.
  84. How to sift out a clean data subset in the presence of data poisoning? arXiv preprint arXiv:2210.06516, 2022.
  85. Fine-mixing: Mitigating backdoors in fine-tuned language models. In EMNLP, 2022.
  86. When foundation model meets federated learning: Motivations, challenges, and future directions. arXiv preprint arXiv:2306.15546, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Chen Chen (752 papers)
  2. Jie Fu (229 papers)
  3. Lingjuan Lyu (131 papers)
Citations (60)
Youtube Logo Streamline Icon: https://streamlinehq.com