Waning Power of Evaluation in NLP

Determine whether the current trajectory of natural language processing research implies that the power of evaluation—as a force that drives change through adoption—Is waning across the field, despite evaluation’s potential to realize more pluralistic ambitions.

Background

The paper frames evaluation not only as a technical lens but as a sociopolitical force that can drive change by coordinating research through adoption. The authors argue that adoption imbues evaluation with power and that, historically, certain benchmarks have shaped entire paradigms.

Using LLMs as a case study, the paper observes that resources (e.g., compute and money) currently dominate development, with inconsistent evaluation practices across prominent models. This suggests evaluation may be losing influence compared to competing forces, motivating the conjecture about waning power.

References

Under our analysis, we conjecture that the current trajectory of NLP suggests evaluation's power is waning, in spite of its potential for realizing more pluralistic ambitions in the field.

Evaluation for Change  (2212.11670 - Bommasani, 2022) in Abstract (page 1)