Uncertain Adoption and Power of Holistic and Participatory Evaluations

Ascertain the level of future adoption and the corresponding power that holistic, multi-desiderata evaluation frameworks such as HELM (Liang et al., 2022b) and participatory, community-driven approaches such as BIG-bench (Srivastava et al., 2022) will achieve within NLP.

Background

The authors argue evaluation can enable pluralism by foregrounding multiple desiderata and by inviting broader community participation in benchmark design. They cite holistic frameworks and participatory efforts as promising directions.

However, they explicitly note uncertainty regarding whether such approaches will be widely adopted and thus accrue enough influence to drive change, highlighting a key unresolved question about evaluation’s future role and power in the field.

References

While initial efforts indicate the potential for such holistic approaches that reflect many different desiderata (Liang et al., 2022b) as well as participatory approaches that permit contribution from different entities (e.g. Srivastava et al., 2022), it is still unclear how much adoption such approaches will get, and therefore how much power they will acquire.

Evaluation for Change  (2212.11670 - Bommasani, 2022) in Limitations (Section 6)