Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
51 tokens/sec
2000 character limit reached

What we learned while automating bias detection in AI hiring systems for compliance with NYC Local Law 144 (2501.10371v1)

Published 13 Dec 2024 in cs.CY and cs.AI

Abstract: Since July 5, 2023, New York City's Local Law 144 requires employers to conduct independent bias audits for any automated employment decision tools (AEDTs) used in hiring processes. The law outlines a minimum set of bias tests that AI developers and implementers must perform to ensure compliance. Over the past few months, we have collected and analyzed audits conducted under this law, identified best practices, and developed a software tool to streamline employer compliance. Our tool, ITACA_144, tailors our broader bias auditing framework to meet the specific requirements of Local Law 144. While automating these legal mandates, we identified several critical challenges that merit attention to ensure AI bias regulations and audit methodologies are both effective and practical. This document presents the insights gained from automating compliance with NYC Local Law 144. It aims to support other cities and states in crafting similar legislation while addressing the limitations of the NYC framework. The discussion focuses on key areas including data requirements, demographic inclusiveness, impact ratios, effective bias, metrics, and data reliability.

Summary

  • The paper demonstrates that automating bias audits with the ITACA_144 tool streamlines compliance while exposing challenges in data relevancy and fairness metrics.
  • It reveals that current regulatory provisions, like outdated data use and exclusion of small demographic groups, hinder effective bias detection in AI hiring.
  • The study recommends a comprehensive audit approach covering the entire AI system lifecycle to better identify and mitigate effective bias in recruitment processes.

This paper discusses the practical challenges encountered and lessons learned while developing software to automate bias audits for AI hiring systems in compliance with New York City's Local Law 144. The authors, from Eticas.ai, created a tool called ITACA_144, derived from their more comprehensive ITACA_OS platform, to streamline the legally mandated bias audits for Automated Employment Decision Tools (AEDTs).

Local Law 144 requires employers using AEDTs to conduct annual independent bias audits. Eticas.ai's automation effort aimed to make compliance more affordable and transform it into an opportunity for system optimization by identifying and minimizing errors. However, the process highlighted several shortcomings in the law's current formulation and its practical application.

Key learnings and recommendations include:

  1. Data Requirements: The law lacks specific requirements for the data used in audits, such as its recency or geographical relevance. Audits might use outdated or non-local data.
    • Recommendation: Mandate the use of historical data from the last 12 months and ensure it pertains specifically to NYC-relevant hiring processes to reduce temporal and deployment bias.
  2. Demographic Inclusiveness: The law permits excluding demographic categories representing less than 2% of the audit dataset. This provision often leads to the exclusion of groups like American Indian, Alaska Native, Native Hawaiian, Pacific Islander, and others, potentially overlooking bias against these vulnerable populations.
    • Recommendation: Remove the 2% exclusion rule and provide clearer definitions for categories like "Some Other Race".
  3. Impact Ratio vs. Fairness: Law 144 mandates calculating the Impact Ratio (IR), which measures differences in selection rates between groups. While applicable in production without true labels, IR alone is insufficient to determine fairness. Systems can achieve proportional outcomes (similar selection rates) while still exhibiting discriminatory behavior through proxies or differential treatment.
    • Recommendation: Acknowledge the limitations of IR and consider incorporating deeper analyses, like counterfactual fairness checks, to understand if the model ensures equal treatment beyond just proportional outcomes. Clarify whether the law's goal is proportional representation or equal treatment.
  4. Effective Bias: Focusing solely on model metrics overlooks biases introduced before the model (e.g., biased training data, data curation) and after the model's output (e.g., human decisions in the hiring pipeline). Assessing only the AEDT provides a partial view and cannot capture the effective bias of the entire process.
    • Recommendation: Require documentation and transparency throughout the AI system's lifecycle. Audits should ideally capture data from pre-processing, in-processing, and post-processing stages to identify where bias originates and how interventions affect outcomes.
  5. Metrics: The law references the 80/20 rule (a common threshold for acceptable IR) but doesn't enforce action if a system falls outside this range. Furthermore, other metrics could provide more meaningful guidance.
    • Recommendation: Use benchmarks based on representativity (e.g., comparing hiring demographics to census data) and focus on improving representation relative to input data bias. Policymakers should define clear, enforceable metrics that guide developers toward genuinely fairer systems.
  6. Data Reliability: Audits currently depend on data provided by the entity being audited, creating potential for misrepresentation.
    • Recommendation: Regulators should implement random, in-depth spot checks, potentially involving executing the system, to verify the submitted data and deter dishonest reporting, similar to practices in other regulated sectors.

In conclusion, while NYC Local Law 144 is a significant step towards AI accountability in hiring, its current limitations may lead to compliance exercises that don't genuinely improve fairness or safety. The authors advocate for refining the regulatory requirements based on practical auditing experiences to ensure that bias measurement standards are robust, useful, and promote meaningful improvements in AI systems.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com