Discrepancy in multilingual attack trends (LRL vs. HRL)

Investigate the discrepancy between prior reports that low-resource language (LRL) attacks perform better than high-resource language (HRL) attacks and the absence of this trend in the authors’ multilingual evaluation, determining the causes of the differing outcomes.

Background

Prior work reported that adversarial attacks in low-resource languages tend to outperform those in high-resource languages. The authors ran a multilingual evaluation (including HRL, MRL, and LRL) and did not observe the same trend, explicitly leaving a follow-up investigation as future work.

This raises a targeted empirical question about cross-lingual robustness differences, potentially involving translation quality, dataset selection, model behavior across languages, or evaluation methodology.

References

In both \citep{yong2023low} and \citep{shen2024language}, it was observed that LRL attacks perform better than HRL attacks. We do not see that trend in Table \ref{table:multilingual_asr}. We leave investigation of this to future work.

Improving Alignment and Robustness with Circuit Breakers (2406.04313 - Zou et al., 6 Jun 2024) in Section: Multilingual Results