Discrepancy in multilingual attack trends (LRL vs. HRL)
Investigate the discrepancy between prior reports that low-resource language (LRL) attacks perform better than high-resource language (HRL) attacks and the absence of this trend in the authors’ multilingual evaluation, determining the causes of the differing outcomes.
Sponsor
References
In both \citep{yong2023low} and \citep{shen2024language}, it was observed that LRL attacks perform better than HRL attacks. We do not see that trend in Table \ref{table:multilingual_asr}. We leave investigation of this to future work.
— Improving Alignment and Robustness with Circuit Breakers
(2406.04313 - Zou et al., 6 Jun 2024) in Section: Multilingual Results