- The paper demonstrates that translating high-level goals into tractable tasks involves discretionary ethical judgments and subjective decisions.
- The paper uses ethnographic research to reveal that problem formulation is an iterative, context-dependent process influenced by organizational dynamics.
- The paper advocates for early normative interventions in data science projects to address fairness concerns and mitigate downstream biases.
Problem Formulation and Fairness in Data Science Projects
This essay analyzes Passi and Barocas's exploration of problem formulation and its impact on fairness in data science projects, highlighting findings from ethnographic research involving a corporate data science team. The paper elucidates the complexities inherent in formulating data science problems, emphasizing that this process is discretionary, negotiated, and frequently devoid of explicit normative considerations.
The authors assert that the act of translating high-level objectives into tractable data science problems is fraught with uncertainty and requires identifying appropriate target variables and proxies. This is a crucial step, often overlooked in normative assessments, but has significant ethical dimensions. The paper draws on interdisciplinary insights from sociology, history of science, and critical data studies to argue that problem formulation substantially impacts perceptions of a project's fairness, beyond the technical properties of the resulting models.
Ethnographic observations detailed in the paper reveal that problem formulation is an iterative, elastic process influenced by various actors and organizational dynamics. For instance, the case paper of a special financing project for an auto lending company demonstrates how different formulations, such as using dealer decisions or credit score ranges as proxies for lead quality, lead to varied ethical implications and perceived fairness in model outcomes.
Key insights include the dependence of problem formulations on available data and methods, rather than solely on normative commitments. Passi and Barocas articulate that the iterative nature of problem formulation often introduces ethical judgments early in the process. This has theoretical implications in understanding how structural inequalities can be perpetuated or mitigated through choices made during this phase.
Practically, the research underscores the necessity for greater attention to be paid to the initial stages of data science projects where decisions are made about what constitutes a valid target variable or proxy. Intervening in these stages may provide avenues to address fairness concerns before they manifest in model outputs.
The researchers advocate for critical engagement with problem formulation as an essential site for normative investigation and intervention. Recognizing problem formulation as a source of ethical issues allows for identifying downstream disparities and biases at their origin. This perspective aligns with the broader call in data ethics to ensure AI systems are developed and deployed with fairness in mind.
Moving forward, leveraging the insights from this paper could shape future AI developments by instilling rigorous scrutiny in the problem formulation process. As data science continues to pervade diverse sectors, ensuring the ethical integrity of projects through strategic framing of problems will be crucial in safeguarding civil rights and promoting equitable technological advances.