Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Problem Formulation and Fairness (1901.02547v1)

Published 8 Jan 2019 in cs.CY

Abstract: Formulating data science problems is an uncertain and difficult process. It requires various forms of discretionary work to translate high-level objectives or strategic goals into tractable problems, necessitating, among other things, the identification of appropriate target variables and proxies. While these choices are rarely self-evident, normative assessments of data science projects often take them for granted, even though different translations can raise profoundly different ethical concerns. Whether we consider a data science project fair often has as much to do with the formulation of the problem as any property of the resulting model. Building on six months of ethnographic fieldwork with a corporate data science team---and channeling ideas from sociology and history of science, critical data studies, and early writing on knowledge discovery in databases---we describe the complex set of actors and activities involved in problem formulation. Our research demonstrates that the specification and operationalization of the problem are always negotiated and elastic, and rarely worked out with explicit normative considerations in mind. In so doing, we show that careful accounts of everyday data science work can help us better understand how and why data science problems are posed in certain ways---and why specific formulations prevail in practice, even in the face of what might seem like normatively preferable alternatives. We conclude by discussing the implications of our findings, arguing that effective normative interventions will require attending to the practical work of problem formulation.

Citations (192)

Summary

  • The paper demonstrates that translating high-level goals into tractable tasks involves discretionary ethical judgments and subjective decisions.
  • The paper uses ethnographic research to reveal that problem formulation is an iterative, context-dependent process influenced by organizational dynamics.
  • The paper advocates for early normative interventions in data science projects to address fairness concerns and mitigate downstream biases.

Problem Formulation and Fairness in Data Science Projects

This essay analyzes Passi and Barocas's exploration of problem formulation and its impact on fairness in data science projects, highlighting findings from ethnographic research involving a corporate data science team. The paper elucidates the complexities inherent in formulating data science problems, emphasizing that this process is discretionary, negotiated, and frequently devoid of explicit normative considerations.

The authors assert that the act of translating high-level objectives into tractable data science problems is fraught with uncertainty and requires identifying appropriate target variables and proxies. This is a crucial step, often overlooked in normative assessments, but has significant ethical dimensions. The paper draws on interdisciplinary insights from sociology, history of science, and critical data studies to argue that problem formulation substantially impacts perceptions of a project's fairness, beyond the technical properties of the resulting models.

Ethnographic observations detailed in the paper reveal that problem formulation is an iterative, elastic process influenced by various actors and organizational dynamics. For instance, the case paper of a special financing project for an auto lending company demonstrates how different formulations, such as using dealer decisions or credit score ranges as proxies for lead quality, lead to varied ethical implications and perceived fairness in model outcomes.

Key insights include the dependence of problem formulations on available data and methods, rather than solely on normative commitments. Passi and Barocas articulate that the iterative nature of problem formulation often introduces ethical judgments early in the process. This has theoretical implications in understanding how structural inequalities can be perpetuated or mitigated through choices made during this phase.

Practically, the research underscores the necessity for greater attention to be paid to the initial stages of data science projects where decisions are made about what constitutes a valid target variable or proxy. Intervening in these stages may provide avenues to address fairness concerns before they manifest in model outputs.

The researchers advocate for critical engagement with problem formulation as an essential site for normative investigation and intervention. Recognizing problem formulation as a source of ethical issues allows for identifying downstream disparities and biases at their origin. This perspective aligns with the broader call in data ethics to ensure AI systems are developed and deployed with fairness in mind.

Moving forward, leveraging the insights from this paper could shape future AI developments by instilling rigorous scrutiny in the problem formulation process. As data science continues to pervade diverse sectors, ensuring the ethical integrity of projects through strategic framing of problems will be crucial in safeguarding civil rights and promoting equitable technological advances.