Papers
Topics
Authors
Recent
2000 character limit reached

RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions (2402.16431v1)

Published 26 Feb 2024 in cs.CL

Abstract: LLMs have showcased remarkable capabilities in following human instructions. However, recent studies have raised concerns about the robustness of LLMs when prompted with instructions combining textual adversarial samples. In this paper, drawing inspiration from recent works that LLMs are sensitive to the design of the instructions, we utilize instructions in code style, which are more structural and less ambiguous, to replace typically natural language instructions. Through this conversion, we provide LLMs with more precise instructions and strengthen the robustness of LLMs. Moreover, under few-shot scenarios, we propose a novel method to compose in-context demonstrations using both clean and adversarial samples (\textit{adversarial context method}) to further boost the robustness of the LLMs. Experiments on eight robustness datasets show that our method consistently outperforms prompting LLMs with natural language instructions. For example, with gpt-3.5-turbo, our method achieves an improvement of 5.68\% in test set accuracy and a reduction of 5.66 points in Attack Success Rate (ASR).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (92)
  1. Basemah Alshemali and Jugal Kalita. 2019. Toward mitigating adversarial texts. International Journal of Computer Applications, 178(50):1–7.
  2. Ask me anything: A simple strategy for prompting language models.
  3. Deep learning for ai. Communications of the ACM, page 58–65.
  4. Language models are few-shot learners.
  5. BSI. 1973a. Natural Fibre Twines, 3rd edition. British Standards Institution, London. BS 2570.
  6. BSI. 1973b. Natural fibre twines. BS 2570, British Standards Institution, London. 3rd. edn.
  7. The use of user modelling to guide inference and learning. Applied Intelligence, 2(1):37–53.
  8. Evaluating large language models trained on code.
  9. How robust is gpt-3.5 to predecessors? a comprehensive study on language understanding tasks.
  10. Binding language models in symbolic languages.
  11. J.L. Chercheur. 1994. Case-Based Reasoning, 2nd edition. Morgan Kaufman Publishers, San Mateo, CA.
  12. Chatgpt goes to law school. Available at SSRN.
  13. N. Chomsky. 1973. Conditions on transformations. In A festschrift for Morris Halle, New York. Holt, Rinehart & Winston.
  14. Palm: Scaling language modeling with pathways.
  15. Scaling instruction-finetuned language models.
  16. Certified adversarial robustness via randomized smoothing.
  17. Why can gpt learn in-context? language models implicitly perform gradient descent as meta-optimizers.
  18. Training verified learners with learned verifiers.
  19. Umberto Eco. 1990. The Limits of Interpretation. Indian University Press.
  20. Black-box generation of adversarial text sequences to evade deep learning classifiers.
  21. Pal: Program-aided language models.
  22. Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3816–3830, Online. Association for Computational Linguistics.
  23. Badnets: Identifying vulnerabilities in the machine learning model supply chain.
  24. Balancing transparency and risk: The security and privacy risks of open-source machine learning models.
  25. Paul Gerhard Hoel. 1971a. Elementary Statistics, 3rd edition. Wiley series in probability and mathematical statistics. Wiley, New York, Chichester. ISBN 0 471 40300.
  26. Paul Gerhard Hoel. 1971b. Elementary Statistics, 3rd edition, Wiley series in probability and mathematical statistics, pages 19–33. Wiley, New York, Chichester. ISBN 0 471 40300.
  27. Surface form competition: Why the highest probability answer isn’t always right.
  28. Adversarial examples are not bugs, they are features.
  29. Otto Jespersen. 1922. Language: Its Nature, Development, and Origin. Allen and Unwin.
  30. Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
  31. Textbugger: Generating adversarial text against real-world applications. In Proceedings 2019 Network and Distributed System Security Symposium.
  32. Visualizing and understanding neural models in nlp.
  33. Codeie: Large code generation models are better few-shot information extractors.
  34. Transformers as algorithms: Generalization and stability in in-context learning.
  35. Truthfulqa: Measuring how models mimic human falsehoods.
  36. Robustness over time: Understanding adversarial examples’ effectiveness on longitudinal versions of large language models.
  37. Gender and representation bias in GPT-3 generated stories. In Proceedings of the Third Workshop on Narrative Understanding, pages 48–55, Virtual. Association for Computational Linguistics.
  38. Language models of code are few-shot commonsense learners.
  39. Towards deep learning models resistant to adversarial attacks.
  40. Post-hoc interpretability for neural NLP: A survey. ACM Computing Surveys, 55(8):1–42.
  41. Noisy channel language model prompting for few-shot text classification.
  42. Prompting with pseudo-code instructions.
  43. Frequency-guided word substitutions for detecting textual adversarial examples.
  44. Evaluating the robustness to instructions of large language models.
  45. Codegen: An open large language model for code with multi-turn program synthesis.
  46. OpenAI. 2022. Introducing chatgpt. https://openai.com/blog/chatgpt.
  47. Training language models to follow instructions with human feedback.
  48. Pre-trained models for natural language processing: A survey. Science China Technological Sciences, 63(10):1872–1897.
  49. Scaling language models: Methods, analysis & insights from training gopher.
  50. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv: Learning.
  51. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv: Learning,arXiv: Learning.
  52. Generating natural language adversarial examples through probability weighted word saliency. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1085–1097, Florence, Italy. Association for Computational Linguistics.
  53. Generating natural language adversarial examples through probability weighted word saliency. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
  54. Leveraging large language models for multiple choice question answering.
  55. Multitask prompted training enables zero-shot task generalization.
  56. Bloom: A 176b-parameter open-access multilingual language model.
  57. "do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models.
  58. Chatgpt and other large language models are double-edged swords.
  59. Navigating the overkill in large language models. arXiv preprint arXiv:2401.17633.
  60. A history of technology. Oxford University Press, London. 5 vol.
  61. Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model.
  62. Jannik Strötgen and Michael Gertz. 2012. Temporal tagging on different domains: Challenges, strategies, and gold standards. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), pages 3746–3753, Istanbul, Turkey. European Language Resource Association (ELRA).
  63. Superheroes experiences with books, 20th edition. The Phantom Editors Associates, Gotham City.
  64. Transformers learn in-context by gradient descent.
  65. Glue: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP.
  66. On the robustness of chatgpt: An adversarial and out-of-distribution perspective.
  67. Orthogonal subspace learning for language model continual learning. arXiv preprint arXiv:2310.14152.
  68. Miner: Improving out-of-vocabulary named entity recognition from an information theoretic perspective. arXiv preprint arXiv:2204.04391.
  69. Textflint: Unified multilingual robustness evaluation toolkit for natural language processing. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, pages 347–355.
  70. Instructuie: Multi-task instruction tuning for unified information extraction. arXiv preprint arXiv:2304.08085.
  71. Code4struct: Code generation for few-shot structured prediction from natural language.
  72. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. Cornell University - arXiv.
  73. Jailbroken: How does llm safety training fail?
  74. Chain-of-thought prompting elicits reasoning in large language models.
  75. Chatgpt prompt patterns for improving code quality, refactoring, requirements elicitation, and software design.
  76. The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864.
  77. An explanation of in-context learning as implicit bayesian inference.
  78. A comprehensive capability analysis of gpt-3 and gpt-3.5 series models.
  79. Word-level textual adversarial attacking as combinatorial optimization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics.
  80. Exploring the curious case of code prompts.
  81. Sparsing and smoothing for the seq2seq models. IEEE Transactions on Artificial Intelligence, pages 1–10.
  82. Calibrate before use: Improving few-shot performance of language models.
  83. Freelb: Enhanced adversarial training for natural language understanding. Cornell University - arXiv.
  84. Freelb: Enhanced adversarial training for natural language understanding.
  85. Promptbench: Towards evaluating the robustness of large language models on adversarial prompts.
  86. Universal and transferable adversarial attacks on aligned language models.
  87. SemEval-2014 Task 4: Aspect Based Sentiment Analysis. Association for Computational Linguistics.
  88. SQuAD: 100,000+ Questions for Machine Comprehension of Text. Association for Computational Linguistics.
  89. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. Association for Computational Linguistics.
  90. Boxin Wang and Chejian Xu and Shuohang Wang and Zhe Gan and Yu Cheng and Jianfeng Gao and Ahmed Hassan Awadallah and Bo Li. 2022. Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models. Conference on Neural Information Processing Systems.
  91. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. Association for Computational Linguistics.
  92. Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis. Association for Computational Linguistics.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.