Sentiment Perception Adversarial Attacks on Neural Machine Translation Systems (2305.01437v2)
Abstract: With the advent of deep learning methods, Neural Machine Translation (NMT) systems have become increasingly powerful. However, deep learning based systems are susceptible to adversarial attacks, where imperceptible changes to the input can cause undesirable changes at the output of the system. To date there has been little work investigating adversarial attacks on sequence-to-sequence systems, such as NMT models. Previous work in NMT has examined attacks with the aim of introducing target phrases in the output sequence. In this work, adversarial attacks for NMT systems are explored from an output perception perspective. Thus the aim of an attack is to change the perception of the output sequence, without altering the perception of the input sequence. For example, an adversary may distort the sentiment of translated reviews to have an exaggerated positive sentiment. In practice it is challenging to run extensive human perception experiments, so a proxy deep-learning classifier applied to the NMT output is used to measure perception changes. Experiments demonstrate that the sentiment perception of NMT systems' output sequences can be changed significantly with small imperceptible changes to input sequences.
- Comparing attention-based convolutional and recurrent neural networks: Success and limitations in machine reading comprehension. CoRR, abs/1808.08744, 2018. URL http://arxiv.org/abs/1808.08744.
- Adversarial examples detection in features distance spaces. In Leal-Taixé, L. and Roth, S. (eds.), Computer Vision – ECCV 2018 Workshops, pp. 313–327, Cham, 2019. Springer International Publishing. ISBN 978-3-030-11012-3.
- Seq2sick: Evaluating the robustness of sequence-to-sequence models with adversarial examples. CoRR, abs/1803.01128, 2018. URL http://arxiv.org/abs/1803.01128.
- On adversarial examples for character-level neural machine translation. CoRR, abs/1806.09030, 2018. URL http://arxiv.org/abs/1806.09030.
- Fellbaum, C. WordNet: An electronic lexical database. MIT Press, 1998.
- Foundation, W. Acl 2019 fourth conference on machine translation (wmt19), shared task: Machine translation of news. URL http://www.statmt.org/wmt19/translation-task.html.
- Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015. URL http://arxiv.org/abs/1412.6572.
- Adversarial example generation with syntactically controlled paraphrase networks. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1875–1885, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-1170. URL https://aclanthology.org/N18-1170.
- Adversarial examples for evaluating reading comprehension systems. CoRR, abs/1707.07328, 2017. URL http://arxiv.org/abs/1707.07328.
- Textbugger: Generating adversarial text against real-world applications. CoRR, abs/1812.05271, 2018. URL http://arxiv.org/abs/1812.05271.
- On evaluation of adversarial perturbations for sequence-to-sequence models. CoRR, abs/1903.06620, 2019. URL http://arxiv.org/abs/1903.06620.
- Adversarial reprogramming of sequence classification neural networks. CoRR, abs/1809.01829, 2018. URL http://arxiv.org/abs/1809.01829.
- Facebook fair’s WMT19 news translation task submission. CoRR, abs/1907.06616, 2019. URL http://arxiv.org/abs/1907.06616.
- odenet. Hdasprachtechnologie/odenet: Open german wordnet. URL https://github.com/hdaSprachtechnologie/odenet.
- Post, M. A call for clarity in reporting BLEU scores. CoRR, abs/1804.08771, 2018. URL http://arxiv.org/abs/1804.08771.
- Language Models are Unsupervised Multitask Learners. 2019. URL https://openai.com/blog/better-language-models/.
- Residue-based natural language adversarial attack detection, 2022. URL https://arxiv.org/abs/2204.10192.
- Universal Adversarial Attacks on Spoken Language Assessment Systems. In Proc. Interspeech 2020, pp. 3855–3859, 2020. doi: 10.21437/Interspeech.2020-1890. URL http://dx.doi.org/10.21437/Interspeech.2020-1890.
- Grammatical error correction systems for automated assessment: Are they susceptible to universal adversarial attacks? Asia-Pacific Chapter of the Association for Computational Linguistics, 2022. doi: https://doi.org/10.17863/CAM.88891.
- Generating natural language adversarial examples through probability weighted word saliency. In ACL (1), pp. 1085–1097, 2019. URL https://doi.org/10.18653/v1/p19-1103.
- Grammatical error correction with neural reinforcement learning. CoRR, abs/1707.00299, 2017. URL http://arxiv.org/abs/1707.00299.
- Identify susceptible locations in medical records via adversarial attacks on deep predictive models. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery; Data Mining, KDD ’18, pp. 793–801, New York, NY, USA, 2018. Association for Computing Machinery. ISBN 9781450355520. doi: 10.1145/3219819.3219909. URL https://doi.org/10.1145/3219819.3219909.
- Intriguing properties of neural networks. 12 2013.
- Attention is all you need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
- wiki-ru wordnet. Wiki-ru-wordnet. URL https://pypi.org/project/wiki-ru-wordnet/.
- Crafting adversarial examples for neural machine translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 2021. doi: 10.18653/v1/2021.acl-long.153. URL https://doi.org/10.18653%2Fv1%2F2021.acl-long.153.
- Generating natural adversarial examples. CoRR, abs/1710.11342, 2017. URL http://arxiv.org/abs/1710.11342.
- A reinforced generation of adversarial samples for neural machine translation. CoRR, abs/1911.03677, 2019. URL http://arxiv.org/abs/1911.03677.
- Vyas Raina (18 papers)
- Mark Gales (52 papers)