Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

M-IFEval: Multilingual Instruction-Following Evaluation (2502.04688v1)

Published 7 Feb 2025 in cs.CL and cs.AI

Abstract: Instruction following is a core capability of modern LLMs, making evaluating this capability essential to understanding these models. The Instruction Following Evaluation (IFEval) benchmark from the literature does this using objective criteria, offering a measure of LLM performance without subjective AI or human judgement. However, it only includes English instructions, limiting its ability to assess LLMs in other languages. We propose the Multilingual Instruction Following Evaluation (M-IFEval) benchmark, expanding the evaluation to French, Japanese, and Spanish, with both general and language-specific instructions. Applying this benchmark to 8 state-of-the-art LLMs, we find that benchmark performance across languages and instruction types can vary widely, underscoring the importance of a multilingual benchmark for evaluating LLMs in a diverse cultural context.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Antoine Dussolle (1 paper)
  2. Andrea Cardeña Díaz (1 paper)
  3. Shota Sato (3 papers)
  4. Peter Devine (5 papers)