Between Subjectivity and Imposition: Power Dynamics in Data Annotation for Computer Vision (2007.14886v2)

Published 29 Jul 2020 in cs.HC, cs.CV, and cs.LG

Abstract: The interpretation of data is fundamental to machine learning. This paper investigates practices of image data annotation as performed in industrial contexts. We define data annotation as a sense-making practice, where annotators assign meaning to data through the use of labels. Previous human-centered investigations have largely focused on annotators subjectivity as a major cause for biased labels. We propose a wider view on this issue: guided by constructivist grounded theory, we conducted several weeks of fieldwork at two annotation companies. We analyzed which structures, power relations, and naturalized impositions shape the interpretation of data. Our results show that the work of annotators is profoundly informed by the interests, values, and priorities of other actors above their station. Arbitrary classifications are vertically imposed on annotators, and through them, on data. This imposition is largely naturalized. Assigning meaning to data is often presented as a technical matter. This paper shows it is, in fact, an exercise of power with multiple implications for individuals and society.

Citations (81)

View on Semantic Scholar

Summary

The paper reveals that power dynamics, not just annotator subjectivity, drive labeling outcomes in industrial data annotation.
It employs a constructivist grounded theory methodology with fieldwork and expert interviews to uncover layered socio-economic influences.
The study advocates for transparent annotation practices to mitigate inherent biases and enhance accountability in machine learning systems.

Between Subjectivity and Imposition: Power Dynamics in Data Annotation for Computer Vision

This paper by Miceli, Schuessler, and Yang provides a thorough investigation into the socio-economic and hierarchical structures influencing image data annotation as part of machine learning processes in industrial settings. The authors focus on the power dynamics that emerge in these contexts, challenging the notion of data annotation as a neutral practice by highlighting its deeply interpretative and systemic nature.

The research employs a constructivist grounded theory methodology, which includes several weeks of fieldwork at two data annotation companies and additional interviews with industry experts. This approach allows the authors to dig into the multi-layered reality of data annotation work. Unlike most prior investigations that centered on the cognitive biases of individual annotators as primary sources of labeling errors, this paper broadens the scope by exploring external structures and influences on the data annotation process.

Key findings highlight that annotators' subjectivity is not the sole factor influencing labeling outcomes. Instead, hierarchical power imbalances play a significant role. Client or management-imposed categories and standards drive the interpretation of images and subsequent annotations, frequently overriding annotators’ judgments. This imposition of meaning is often unconsciously accepted by annotators, who perceive the instructed classifications as self-evident truths rather than imposed constructs. The authors use Bourdieu's concept of symbolic power to frame these dynamics, arguing that they contribute to the naturalization of potentially arbitrary classifications within datasets.

The paper provides concrete examples from projects at the observed companies that reveal how layers of actors—ranging from annotators and quality assurance teams to managers and clients—participate in and influence the annotation process. This involvement further complicates accountability, as the iterative and hierarchical nature of the processes dilutes responsibility for the final quality and societal implications of the annotated data.

These insights hold profound implications. The findings suggest that the annotation industry's orientation towards efficiency and cost-effectiveness often neglects the ethical dimensions of data annotation. This omission can perpetuate discriminatory practices in machine learning systems if biases are systematically encoded into datasets.

Practically, the paper calls for comprehensive documentation of the annotation process, advocating for transparency about actors involved and rationale decisions behind dataset creations. Such measures could facilitate deliberative accountability and improve adherence to ethical and regulatory standards like GDPR.

Theoretically, the paper underscores the importance of considering power asymmetries in practices of data annotation and creation. Researchers are urged to employ similar perspectives in their inquiries into data-driven systems to uncover and address the deep-rooted biases that power imbalances can inscribe into technological frameworks.

In conclusion, this paper underscores the potency of power dynamics in shaping both the practice and outputs of data annotation, challenging AI researchers and practitioners to rethink and refine their approaches to ensure equitable and accountable machine learning systems.

PDF Markdown

Related Papers

YouTube

Show All Videos