On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook (2307.16680v7)
Abstract: Diffusion models and LLMs have emerged as leading-edge generative models, revolutionizing various aspects of human life. However, the practical implementations of these models have also exposed inherent risks, bringing to the forefront their evil sides and sparking concerns regarding their trustworthiness. Despite the wealth of literature on this subject, a comprehensive survey specifically delving into the intersection of large-scale generative models and their trustworthiness remains largely absent. To bridge this gap, this paper investigates both the long-standing and emerging threats associated with these models across four fundamental dimensions: 1) privacy, 2) security, 3) fairness, and 4) responsibility. Based on the investigation results, we develop an extensive map outlining the trustworthiness of large generative models. After that, we provide practical recommendations and potential research directions for future secure applications equipped with large generative models, ultimately promoting the trustworthiness of the models and benefiting the society as a whole.
- Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. 308–318.
- Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security. 79–90.
- ReDMark: Framework for residual diffusion watermarking based on deep networks. Expert Systems with Applications 146 (2020), 113157.
- Tracing knowledge in language models back to the training data. arXiv preprint arXiv:2205.11482 (2022).
- Ali Al-Haj. 2007. Combined DWT-DCT digital image watermarking. Journal of computer science 3, 9 (2007), 740–746.
- Alex Albert. [n. d.]. Jailbreak Chat.
- Self-consuming generative models go mad. arXiv preprint arXiv:2307.01850 (2023).
- Generating Natural Language Adversarial Examples. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2890–2896.
- Blended Diffusion for Text-driven Editing of Natural Images. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), 18187–18197.
- Amos Azaria and Tom Mitchell. 2023. The internal state of an llm knows when its lying. arXiv preprint arXiv:2304.13734 (2023).
- Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. ArXiv abs/2204.05862 (2022). https://api.semanticscholar.org/CorpusID:248118878
- Bayesian Framework for Gradient Leakage. In International Conference on Learning Representations.
- On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (2021). https://api.semanticscholar.org/CorpusID:232040593
- Patient and Consumer Safety Risks When Using Conversational Assistants for Medical Information: An Observational Study of Siri, Alexa, and Google Assistant. Journal of Medical Internet Research 20 (2018). https://api.semanticscholar.org/CorpusID:52154422
- Multimodal datasets: misogyny, pornography, and malignant stereotypes. ArXiv abs/2110.01963 (2021). https://api.semanticscholar.org/CorpusID:238354158
- Improving language models by retrieving from trillions of tokens. In International conference on machine learning. PMLR, 2206–2240.
- SEGA: Instructing Diffusion using Semantic Dimensions. ArXiv abs/2301.12247 (2023). https://api.semanticscholar.org/CorpusID:256390079
- Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
- Instruction mining: High-quality instruction data selection for large language models. arXiv preprint arXiv:2307.06290 (2023).
- Extracting training data from diffusion models. In 32nd USENIX Security Symposium (USENIX Security 23). 5253–5270.
- Quantifying memorization across neural language models. (2023).
- Are aligned neural networks adversarially aligned?. In 37th Annual Conference on Neural Information Processing Systems (NeurIPS 2023).
- Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21). 2633–2650.
- A closer look at fourier spectrum discrepancies for cnn-generated images detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7200–7209.
- SVD-based digital image watermarking scheme. Pattern Recognition Letters 26, 10 (2005), 1577–1586.
- A Pathway Towards Responsible AI Generated Content. ArXiv abs/2303.01325 (2023). https://api.semanticscholar.org/CorpusID:257280234
- Alpagasus: Training a better alpaca with fewer data. arXiv preprint arXiv:2307.08701 (2023).
- Trojdiff: Trojan attacks on diffusion models with diverse targets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4035–4044.
- Kallima: A clean-label framework for textual backdoor attacks. In European Symposium on Research in Computer Security. Springer, 447–466.
- Badnl: Backdoor attacks against nlp models with semantic-preserving improvements. In Annual computer security applications conference. 554–569.
- Fair generative modeling via weak supervision. In International Conference on Machine Learning. PMLR, 1887–1898.
- How to backdoor diffusion models?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4015–4024.
- VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models. arXiv preprint arXiv:2306.06874 (2023).
- Inducing anxiety in large language models increases exploration and bias. arXiv preprint arXiv:2304.11111 (2023).
- On the detection of synthetic images generated by diffusion models. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5.
- Secure spread spectrum watermarking for images, audio and video. In Proceedings of 3rd IEEE international conference on image processing, Vol. 3. IEEE, 243–246.
- Towards universal gan image detection. In 2021 International Conference on Visual Communications and Image Processing (VCIP). IEEE, 1–5.
- Investigating Data Memorization in 3D Latent Diffusion Models for Medical Image Synthesis. arXiv preprint arXiv:2307.01148 (2023).
- Plug and Play Language Models: A Simple Approach to Controlled Text Generation. (2020).
- Racial Bias in Hate Speech and Abusive Language Detection Datasets. In Proceedings of the Third Workshop on Abusive Language Online. 25–35.
- TAG: Gradient Attack on Transformer-based Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2021. 3600–3610.
- Toxicity in ChatGPT: Analyzing Persona-assigned Language Models. ArXiv abs/2304.05335 (2023). https://api.semanticscholar.org/CorpusID:258060002
- Chain-of-verification reduces hallucination in large language models. arXiv preprint arXiv:2309.11495 (2023).
- Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 189–194.
- Improving Factuality and Reasoning in Language Models through Multiagent Debate. arXiv preprint arXiv:2305.14325 (2023).
- Are diffusion models vulnerable to membership inference attacks? arXiv preprint arXiv:2302.01316 (2023).
- Towards More Realistic Membership Inference Attacks on Large Diffusion Models. arXiv preprint arXiv:2306.12983 (2023).
- Fourier spectrum discrepancies in deep network generated images. Advances in neural information processing systems 33 (2020), 3022–3032.
- Ronen Eldan and Yuanzhi Li. 2023. TinyStories: How Small Can Language Models Be and Still Speak Coherent English? arXiv preprint arXiv:2305.07759 (2023).
- SplitGuard: Detecting and Mitigating Training-Hijacking Attacks in Split Learning. IACR Cryptol. ePrint Arch. 2021 (2021), 1080.
- UnSplit: Data-Oblivious Model Inversion, Model Stealing, and Label Inference Attacks Against Split Learning. IACR Cryptol. ePrint Arch. 2021 (2021), 1074.
- SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation. ArXiv abs/2310.12508 (2023). https://api.semanticscholar.org/CorpusID:264305818
- On the Robustness of Split Learning against Adversarial Attacks. ECAI (2023).
- Refiner: Data Refining against Gradient Leakage Attacks in Federated Learning. arXiv preprint arXiv:2212.02042 (2023).
- Enhance transferability of adversarial examples with model architecture. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5.
- Defense against Backdoor Attacks via Identifying and Purifying Bad Neurons. arXiv preprint arXiv:2208.06537 (2022).
- Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs. arXiv preprint arXiv:2309.03118 (2023).
- The stable signature: Rooting watermarks in latent diffusion models. arXiv preprint arXiv:2303.15435 (2023).
- Leveraging frequency analysis for deep fake image recognition. In International conference on machine learning. PMLR, 3247–3258.
- Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness. ArXiv abs/2302.10893 (2023). https://api.semanticscholar.org/CorpusID:257079049
- Erasing Concepts from Diffusion Models. ArXiv abs/2303.07345 (2023). https://api.semanticscholar.org/CorpusID:257495777
- Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858 (2022).
- Rarr: Researching and revising what language models say, using language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 16477–16508.
- Making Pre-trained Language Models Better Few-shot Learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 3816–3830.
- Creating training corpora for nlg micro-planning. In 55th annual meeting of the Association for Computational Linguistics (ACL).
- Siddhant Garg and Goutham Ramakrishnan. 2020. BAE: BERT-based Adversarial Examples for Text Classification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6174–6181.
- RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models. In Findings. https://api.semanticscholar.org/CorpusID:221878771
- GLTR: Statistical Detection and Visualization of Generated Text. In Annual Meeting of the Association for Computational Linguistics. https://api.semanticscholar.org/CorpusID:182952848
- Inverting gradients-how easy is it to break privacy in federated learning? Advances in Neural Information Processing Systems 33 (2020), 16937–16947.
- Towards discovery and attribution of open-world gan generated images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14094–14103.
- Explaining and harnessing adversarial examples. (2014).
- Are GAN generated images easy to detect? A critical analysis of the state-of-the-art. In 2021 IEEE international conference on multimedia and expo (ICME). IEEE, 1–6.
- Bias correction of learned generative models using likelihood-free importance weighting. Advances in neural information processing systems 32 (2019).
- BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. ArXiv abs/1708.06733 (2017).
- Textbooks Are All You Need. arXiv preprint arXiv:2306.11644 (2023).
- On calibration of modern neural networks. In International conference on machine learning. PMLR, 1321–1330.
- Gradient-based Adversarial Attacks against Text Transformers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 5747–5757.
- The Political Ideology of Conversational AI: Converging Evidence on ChatGPT’s Pro-environmental, Left-Libertarian Orientation. Left-Libertarian Orientation (January 1, 2023) (2023).
- Flexible diffusion modeling of long videos. Advances in Neural Information Processing Systems 35 (2022), 27953–27965.
- Jamie Hayes and George Danezis. 2017. Generating steganographic images via adversarial training. Advances in neural information processing systems 30 (2017).
- Aligning AI With Shared Human Values. (2021).
- Membership inference attacks on sequence-to-sequence models: Is my data in your machine translation system? Transactions of the Association for Computational Linguistics 8 (2020), 49–63.
- Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
- Parameter-Efficient Transfer Learning for NLP. In International Conference on Machine Learning.
- Hailong Hu and Jun Pang. 2023. Membership inference of diffusion models. arXiv preprint arXiv:2301.09956 (2023).
- Membership Inference Attacks on Machine Learning: A Survey. ACM Computing Surveys (CSUR) 54 (2021), 1 – 37.
- Are Large Pre-Trained Language Models Leaking Your Personal Information?. In Findings of the Association for Computational Linguistics: EMNLP 2022. 2038–2047.
- ChatGPT an ENFJ, Bard an ISTJ: Empirical Study on Personalities of Large Language Models. arXiv preprint arXiv:2305.19926 (2023).
- TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models. ArXiv abs/2306.11507 (2023). https://api.semanticscholar.org/CorpusID:259202452
- Automatic Detection of Generated Text is Easiest when Humans are Fooled. In Annual Meeting of the Association for Computational Linguistics. https://api.semanticscholar.org/CorpusID:218560609
- Measuring forgetting of memorized training examples. (2023).
- Gradient inversion with generative image prior. Advances in Neural Information Processing Systems 34 (2021).
- Fingerprintnet: Synthesized fingerprints for generated image detection. In European Conference on Computer Vision. Springer, 76–94.
- Mpi: Evaluating and inducing personality in pre-trained language models. arXiv preprint arXiv:2206.07550 (2022).
- Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 8018–8025.
- Automatically Auditing Large Language Models via Discrete Optimization. arXiv preprint arXiv:2303.04381 (2023).
- Deduplicating training data mitigates privacy risks in language models. In International Conference on Machine Learning. PMLR, 10697–10707.
- Sanjay Kariyappa and Moinuddin K. Qureshi. 2021. Gradient Inversion Attack: Leaking Private Labels in Two-Party Split Learning. ArXiv abs/2112.01299 (2021).
- Estimating the Personality of White-Box Language Models. arXiv preprint arXiv:2204.12000 (2022).
- Mahyar Khayatkhoei and Ahmed Elgammal. 2022. Spatial frequency bias in convolutional generative adversarial networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 7152–7159.
- De-stereotyping Text-to-image Models through Prompt Tuning. workshop in ICML 2023 (2023).
- ProPILE: Probing Privacy Leakage in Large Language Models. arXiv preprint arXiv:2307.01881 (2023).
- Bias-to-Text: Debiasing Unknown Visual Biases through Language Interpretation. https://api.semanticscholar.org/CorpusID:258832596
- A Watermark for Large Language Models. In Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 17061–17084. https://proceedings.mlr.press/v202/kirchenbauer23a.html
- An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization. arXiv preprint arXiv:2305.18355 (2023).
- GeDi: Generative Discriminator Guided Sequence Generation. In Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:221655075
- Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense. ArXiv abs/2303.13408 (2023). https://api.semanticscholar.org/CorpusID:257687440
- Ablating concepts in text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 22691–22702.
- Weight Poisoning Attacks on Pretrained Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2793–2806.
- Martin Kutter and Fabien AP Petitcolas. 1999. Fair benchmark for image watermarking systems. In Security and watermarking of multimedia contents, Vol. 3657. SPIE, 226–239.
- Civil Rephrases Of Toxic Texts With Self-Supervised Transformers. In Conference of the European Chapter of the Association for Computational Linguistics. https://api.semanticscholar.org/CorpusID:231861515
- Platypus: Quick, cheap, and powerful refinement of llms. arXiv preprint arXiv:2308.07317 (2023).
- Factuality enhanced language models for open-ended text generation. Advances in Neural Information Processing Systems 35 (2022), 34586–34599.
- Peter Lee. [n. d.]. Learning from tay’s introduction. https://blogs.microsoft.com/blog/2016/03/25/learning-tays-introduction/, month = 12, year = 2016,.
- Does BERT Pretrained on Clinical Notes Reveal Sensitive Data?. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 946–959.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459–9474.
- ChatHaruhi: Reviving Anime Character in Reality via Large Language Model. arXiv preprint arXiv:2308.09597 (2023).
- ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), 10184–10192.
- Chatgpt as an attack tool: Stealthy textual backdoor attack via blackbox generative model trigger. arXiv preprint arXiv:2304.14475 (2023).
- A review of applications in federated learning. Computers & Industrial Engineering 149 (2020), 106854.
- Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 3023–3032.
- How Pre-trained Language Models Capture Factual Knowledge? A Causal-Inspired Analysis. In Findings of the Association for Computational Linguistics: ACL 2022. 1720–1732.
- Is gpt-3 a psychopath? evaluating large language models from a psychological perspective. arXiv preprint arXiv:2212.10529 (2022).
- Deepfake Text Detection in the Wild. ArXiv abs/2305.13242 (2023). https://api.semanticscholar.org/CorpusID:258832454
- Backdoor Learning: A Survey. IEEE transactions on neural networks and learning systems PP (2020).
- Adversarial example does good: Preventing painting imitation from diffusion models via adversarial examples. In International Conference on Machine Learning. PMLR, 20763–20786.
- Adversarial Attack and Defense: A Survey. Electronics (2022).
- Let’s Verify Step by Step. arXiv:2305.20050 [cs.LG]
- Word-Level Explanations for Analyzing Bias in Text-to-Image Models. ArXiv abs/2306.05500 (2023). https://api.semanticscholar.org/CorpusID:259129312
- TruthfulQA: Measuring How Models Mimic Human Falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 3214–3252.
- TruthfulQA: Measuring How Models Mimic Human Falsehoods. In Annual Meeting of the Association for Computational Linguistics. https://api.semanticscholar.org/CorpusID:237532606
- Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. Comput. Surveys 55 (2021), 1 – 35.
- Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models’ Alignment. ArXiv abs/2308.05374 (2023). https://api.semanticscholar.org/CorpusID:260775522
- TATION IN LLMS. [n. d.]. CAN LLMS EXPRESS THEIR UNCERTAINTY? AN EMPIRICAL EVALUATION OF CONFIDENCE ELICI. ([n. d.]).
- Analyzing Leakage of Personally Identifiable Information in Language Models. In 2023 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 346–363.
- Distortion agnostic deep watermarking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13548–13557.
- When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 9802–9822.
- Detecting gan-generated images by orthogonal training of multiple cnns. In 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 3091–3095.
- Do gans leave artificial fingerprints?. In 2019 IEEE conference on multimedia information processing and retrieval (MIPR). IEEE, 506–511.
- Membership inference attacks against diffusion models. arXiv preprint arXiv:2302.03262 (2023).
- Membership Inference Attacks against Language Models via Neighbourhood Comparison. arXiv preprint arXiv:2305.18462 (2023).
- Scott McCloskey and Michael Albright. 2018. Detecting gan-generated imagery using color cues. arXiv preprint arXiv:1812.08247 (2018).
- Scott McCloskey and Michael Albright. 2019. Detecting GAN-generated imagery using saturation cues. In 2019 IEEE international conference on image processing (ICIP). IEEE, 4584–4588.
- Kris McGuffie and Alex Newhouse. 2020. The Radicalization Risks of GPT-3 and Advanced Neural Language Models. ArXiv abs/2009.06807 (2020). https://api.semanticscholar.org/CorpusID:221703020
- Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?. In Conference on Empirical Methods in Natural Language Processing.
- Smartphone-Based Conversational Agents and Responses to Questions About Mental Health, Interpersonal Violence, and Physical Health. JAMA internal medicine 176 5 (2016), 619–25. https://api.semanticscholar.org/CorpusID:3480627
- Who is GPT-3? An exploration of personality, values and demographics. In Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+ CSS). 218–227.
- StereoSet: Measuring stereotypical bias in pretrained language models. In Annual Meeting of the Association for Computational Linguistics. https://api.semanticscholar.org/CorpusID:215828184
- CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models. In Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:222090785
- Detecting GAN generated Fake Images using Co-occurrence Matrices. Electronic Imaging 31 (2019), 1–7.
- Watermarking digital images for copyright protection. IEE PROCEEDINGS VISION IMAGE AND SIGNAL PROCESSING 143 (1996), 250–256.
- OpenAI. 2023. GPT-4 Technical Report. ArXiv abs/2303.08774 (2023).
- Joseph JK O’Ruanaidh and Thierry Pun. 1997. Rotation, scale and translation invariant digital image watermarking. In Proceedings of International Conference on Image Processing, Vol. 1. IEEE, 536–539.
- Keyu Pan and Yawen Zeng. 2023. Do llms possess a personality? making the mbti test an amazing evaluation for large language models. arXiv preprint arXiv:2307.16180 (2023).
- ToTTo: A Controlled Table-To-Text Generation Dataset. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1173–1186.
- Unleashing the Tiger: Inference Attacks on Split Learning. Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (2021).
- The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116 (2023).
- Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813 (2023).
- Red Teaming Language Models with Language Models. In Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:246634238
- Fábio Perez and Ian Ribeiro. 2022. Ignore previous prompt: Attack techniques for language models. arXiv preprint arXiv:2211.09527 (2022).
- Vinay Uday Prabhu and Abeba Birhane. 2020. Large image datasets: A pyrrhic win for computer vision? 2021 IEEE Winter Conference on Applications of Computer Vision (WACV) (2020), 1536–1546. https://api.semanticscholar.org/CorpusID:220265500
- Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 443–453.
- Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 4873–4883.
- Improving language understanding by generative pre-training. (2018).
- Language Models are Unsupervised Multitask Learners.
- Tuning collision warning algorithms to individual drivers for design of active safety systems. 2013 IEEE Globecom Workshops (GC Wkshps) (2013), 1333–1337. https://api.semanticscholar.org/CorpusID:17027437
- Hierarchical Text-Conditional Image Generation with CLIP Latents. ArXiv abs/2204.06125 (2022).
- Zero-Shot Text-to-Image Generation. In International Conference on Machine Learning.
- Generating natural language adversarial examples through probability weighted word saliency. In Proceedings of the 57th annual meeting of the association for computational linguistics. 1085–1097.
- Towards the detection of diffusion model deepfakes. arXiv preprint arXiv:2210.14571 (2022).
- High-Resolution Image Synthesis with Latent Diffusion Models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), 10674–10685.
- Can AI-Generated Text be Reliably Detected? ArXiv abs/2303.11156 (2023). https://api.semanticscholar.org/CorpusID:257631570
- Personality traits in large language models. arXiv preprint arXiv:2307.00184 (2023).
- Palette: Image-to-Image Diffusion Models. ACM SIGGRAPH 2022 Conference Proceedings (2021).
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems 35 (2022), 36479–36494.
- Image Super-Resolution via Iterative Refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (2021), 4713–4726.
- Raising the cost of malicious ai-powered image editing. arXiv preprint arXiv:2302.06588 (2023).
- Suranjana Samanta and Sameep Mehta. 2017. Towards crafting text adversarial samples. arXiv preprint arXiv:1707.02812 (2017).
- PRECODE-A Generic Model Extension to Prevent Deep Gradient Leakage. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1849–1858.
- Timo Schick and Hinrich Schütze. 2020. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference. In Conference of the European Chapter of the Association for Computational Linguistics.
- Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP. Transactions of the Association for Computational Linguistics 9 (2021), 1408–1424. https://api.semanticscholar.org/CorpusID:232075876
- Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), 22522–22531. https://api.semanticscholar.org/CorpusID:253420366
- On the frequency bias of generative models. Advances in Neural Information Processing Systems 34 (2021), 18126–18136.
- A robust image fingerprinting system using the Radon transform. Signal Processing: Image Communication 19, 4 (2004), 325–339.
- Trusting Your Evidence: Hallucinate Less with Context-aware Decoding. arXiv preprint arXiv:2305.14739 (2023).
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning. PMLR, 2256–2265.
- Release Strategies and the Social Impacts of Language Models. ArXiv abs/1908.09203 (2019). https://api.semanticscholar.org/CorpusID:201666234
- Diffusion art or digital forgery? investigating data replication in diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6048–6058.
- Rickrolling the artist: Injecting invisible backdoors into text-guided image generation models. arXiv preprint arXiv:2211.02408 (2022).
- Soteria: Provable defense against privacy leakage in federated learning from representation perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9311–9319.
- Moss: Training conversational language models from synthetic data. arXiv preprint arXiv:2307.15020 7 (2023).
- Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models. ArXiv abs/2102.02503 (2021). https://api.semanticscholar.org/CorpusID:231802467
- LaMDA: Language Models for Dialog Applications. ArXiv abs/2201.08239 (2022).
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
- Attention is All you Need. In NIPS.
- Split learning for health: Distributed deep learning without sharing raw patient data. ArXiv abs/1812.00564 (2018).
- Are deep neural networks good for blind image watermarking?. In 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, 1–7.
- Falcon: Honest-Majority Maliciously Secure Framework for Private Deep Learning. Proceedings on Privacy Enhancing Technologies 2021 (2020), 188 – 208.
- Universal Adversarial Triggers for Attacking and Analyzing NLP. In Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:201698258
- DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models. ArXiv abs/2306.11698 (2023). https://api.semanticscholar.org/CorpusID:259202782
- Hongmin Wang. 2019. Revisiting Challenges in Data-to-Text Generation with Fact Grounding. In Proceedings of the 12th International Conference on Natural Language Generation. 311–322.
- Protect Privacy from Gradient Leakage Attack in Federated Learning. In IEEE INFOCOM 2022-IEEE Conference on Computer Communications. IEEE, 580–589.
- CNN-generated images are surprisingly easy to spot… for now. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8695–8704.
- Self-consistency improves chain of thought reasoning in language models. (2023).
- DIRE for Diffusion-Generated Image Detection. arXiv preprint arXiv:2303.09295 (2023).
- Finetuned Language Models are Zero-Shot Learners. In International Conference on Learning Representations.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
- A framework for evaluating client privacy leakages in federated learning. In European Symposium on Research in Computer Security. Springer, 545–566.
- Challenges in Detoxifying Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2021. 2447–2469.
- Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust. arXiv preprint arXiv:2305.20030 (2023).
- Wavelet-packets for deepfake image analysis and detection. Machine Learning 111, 11 (2022), 4295–4327.
- Membership inference attacks against text-to-image generation models. arXiv preprint arXiv:2210.00968 (2022).
- Mixing Activations and Labels in Distributed Training for Split Learning. IEEE Transactions on Parallel and Distributed Systems PP (2021), 1–1.
- ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation. ArXiv abs/2304.05977 (2023). https://api.semanticscholar.org/CorpusID:258079316
- Shi-You Xu. 2022. CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning. ArXiv abs/2210.04559 (2022).
- On the generalization of GAN image forensics. In Chinese conference on biometric recognition. Springer, 134–141.
- Rethinking stealthiness of backdoor attack against nlp models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 5543–5557.
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601 (2023).
- See through gradients: Image batch recovery via gradinversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16337–16346.
- Artificial fingerprinting for generative models: Rooting deepfake attribution in training data. In Proceedings of the IEEE/CVF International conference on computer vision. 14448–14457.
- Bag of tricks for training data extraction from language models. arXiv preprint arXiv:2302.04460 (2023).
- Gradient obfuscation gives a false sense of security in federated learning. In 32nd USENIX Security Symposium (USENIX Security 23). 6381–6398.
- Word-level Textual Adversarial Attacking as Combinatorial Optimization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 6066–6080.
- Udh: Universal deep hiding for steganography, watermarking, and light field messaging. Advances in Neural Information Processing Systems 33 (2020), 10223–10234.
- BatchCrypt: Efficient Homomorphic Encryption for Cross-Silo Federated Learning. In USENIX Annual Technical Conference.
- Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models. ArXiv abs/2303.17591 (2023). https://api.semanticscholar.org/CorpusID:257833863
- On the Robustness of Latent Diffusion Models. arXiv preprint arXiv:2306.08257 (2023).
- OPT: Open Pre-trained Transformer Language Models. ArXiv abs/2205.01068 (2022).
- idlg: Improved deep leakage from gradients. arXiv preprint arXiv:2001.02610 (2020).
- Calibrate Before Use: Improving Few-Shot Performance of Language Models. In International Conference on Machine Learning. https://api.semanticscholar.org/CorpusID:231979430
- A recipe for watermarking diffusion models. arXiv preprint arXiv:2303.10137 (2023).
- Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206 (2023).
- Challenges in Automated Debiasing for Toxic Language Detection. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume.
- Hidden: Hiding data with deep networks. In Proceedings of the European conference on computer vision (ECCV). 657–672.
- Deep leakage from gradients. Advances in Neural Information Processing Systems 32 (2019).
- A pilot study of query-free adversarial attack against stable diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2384–2391.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.