Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

184 tokens/sec

GPT-4o

7 tokens/sec

Gemini 2.5 Pro Pro

45 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

70 1 2

Wikibench: Community-Driven Data Curation for AI Evaluation on Wikipedia (2402.14147v1)

Published 21 Feb 2024 in cs.HC and cs.AI

Abstract: AI tools are increasingly deployed in community contexts. However, datasets used to evaluate AI are typically created by developers and annotators outside a given community, which can yield misleading conclusions about AI performance. How might we empower communities to drive the intentional design and curation of evaluation datasets for AI that impacts them? We investigate this question on Wikipedia, an online community with multiple AI-based content moderation tools deployed. We introduce Wikibench, a system that enables communities to collaboratively curate AI evaluation datasets, while navigating ambiguities and differences in perspective through discussion. A field study on Wikipedia shows that datasets curated using Wikibench can effectively capture community consensus, disagreement, and uncertainty. Furthermore, study participants used Wikibench to shape the overall data curation process, including refining label definitions, determining data inclusion criteria, and authoring data statements. Based on our findings, we propose future directions for systems that support community-driven data curation.

References (97)

Citations (12)

View on Semantic Scholar

Summary

The paper presents a novel approach where community members curate AI evaluation datasets, ensuring diverse and representative inputs.
It demonstrates how comparative analysis of AI models can reveal misalignments with community norms and values.
The study advocates for scalable community-driven curation methods to enhance the design and evaluation of AI tools.

Empowering Communities in AI Evaluation: The Wikibench Study on Wikipedia

The Advent of Wikibench

The paper explores a prevalent issue in the deployment and evaluation of AI tools within community contexts, particularly focusing on Wikipedia's AI-based content moderation tools. This paper introduces Wikibench, a novel system designed to enable community-driven curation of AI evaluation datasets. Wikibench facilitates collaborative dataset curation by allowing community members to select data points, label them based on personal judgment, and engage in discussions to arrive at a consensus label. This approach notably contrasts with traditional methods of dataset creation, often resulting in datasets that may not accurately represent community consensus or capture the diverse perspectives within the community.

Evaluating Wikibench's Effectiveness

A comprehensive field paper on Wikipedia utilized Wikibench to curate datasets for AI evaluation, revealing several critical insights:

Community Consensus and Diverse Perspectives: The paper demonstrated that datasets curated through Wikibench effectively captured community consensus, disagreement, and uncertainty. It underscores Wikibench's potential in addressing the challenges of traditional dataset creation methods that may overlook the nuanced perspectives within a community.
Practical Implications for AI Evaluation: By comparing two AI models deployed on Wikipedia, the paper illustrated how Wikibench-curated datasets could provide valuable insights into the alignment of AI models with community norms and values. This comparison highlighted the potential misalignments between community perspectives and AI predictions, emphasizing the importance of community-driven evaluation datasets.

Future Directions in Community-driven Data Curation

The findings from this paper suggest several promising directions for advancing community-driven data curation and AI evaluation:

Adapting Wikibench Across Different Communities: Considering Wikibench's success within Wikipedia, future research could explore adapting Wikibench for use in other community contexts. This would involve tailoring Wikibench's design to align with the norms and workflows specific to different communities.
Enhancing Efficiency and Representativeness: Additional studies could focus on balancing community agency in the curation process with the efficiency of data collection and ensuring the dataset's representativeness. This may involve developing methods to guide communities in achieving desired distributional properties for their datasets.
Community-facing Evaluation Interfaces: There is a need for designing interfaces that enable communities to leverage their curated datasets for informed decision-making about AI design and deployment. These interfaces could facilitate more nuanced analyses of AI models' alignments with community perspectives.
Leveraging Content Curation Mechanisms: Wikibench or similar systems could draw further inspiration from existing content curation mechanisms on online platforms. This might involve incorporating new features that support the prioritization of data points for curation based on community-shared visions and values.

Conclusion

The Wikibench paper presents a pivotal step towards empowering communities in the AI evaluation process. By fostering community-driven dataset curation, Wikibench addresses the critical need for AI tools that align with community norms and values. The insights gained from this paper illuminate the path forward for HCI systems that support community-driven data curation, ultimately striving for AI tools that enhance rather than disrupt community ecosystems.

GitHub

GitHub - tskuo/Wikibench (2 stars)

Tweets

https://twitter.com/WikiResearch/status/1761094555699687858

https://twitter.com/tzushengkuo/status/1767926548307251461

https://twitter.com/tzushengkuo/status/1782513929547132954

https://twitter.com/PluralityInst/status/1764663337634418930

https://twitter.com/tzushengkuo/status/1762138358724075836

https://twitter.com/arxivsanitybot/status/1761381783390626275

[2402.14147] Wikibench: Community-Driven Data Curation for AI Evaluation on Wikipedia (1 point, 0 comments)