Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling (2107.03451v3)

Published 7 Jul 2021 in cs.CL and cs.AI

Abstract: Over the last several years, end-to-end neural conversational agents have vastly improved in their ability to carry a chit-chat conversation with humans. However, these models are often trained on large datasets from the internet, and as a result, may learn undesirable behaviors from this data, such as toxic or otherwise harmful language. Researchers must thus wrestle with the issue of how and when to release these models. In this paper, we survey the problem landscape for safety for end-to-end conversational AI and discuss recent and related work. We highlight tensions between values, potential positive impact and potential harms, and provide a framework for making decisions about whether and how to release these models, following the tenets of value-sensitive design. We additionally provide a suite of tools to enable researchers to make better-informed decisions about training and releasing end-to-end conversational AI models.

PDF Abstract

Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling

The paper presented here offers a comprehensive exploration of potential safety issues associated with end-to-end neural conversational AI systems and proposes a framework alongside supportive tooling to anticipate and mitigate these challenges. Unlike traditional task-oriented dialogue systems, E2E systems engage in open-domain conversations, thereby increasing their exposure to generating harmful content derived from training data sourced from diverse online platforms.

Core Safety Concerns

The authors delineate critical safety concerns into three primary effects:

Instigator (Tay) Effect: This occurs when the AI system itself generates harmful content, possibly inciting negative reactions from users.
Yea-Sayer (ELIZA) Effect: The AI model inadvertently agrees with, or fails to adequately challenge, harmful content introduced by users.
Impostor Effect: The system provides unsafe advice in critical situations, such as medical emergencies or interactions involving self-harm.

Methodological Framework

The paper underscores the necessity of a strategic framework to address these concerns while guiding AI system releases responsibly. The framework advocates for:

Examining the intended and unintended use of AI models.
Identifying the potential audience and anticipating possible impacts on various demographic groups.
Envisioning and testing for both beneficial and harmful impacts, which necessitates an examination of historic data and expert consultation.
Leveraging a robust policy framework and transparency measures to govern the distribution and deployment of the model, including user feedback mechanisms to iterate on and improve system safety post-deployment.

Tooling for Safety Assessment

The paper describes a two-fold approach to evaluate model safety:

Unit Tests: Automated testing tools assess initial safety concerns by evaluating the model's propensity to produce harmful language across different input scenarios ("safe", "real world noise", "non-adversarial unsafe", and "adversarial unsafe"). Although these tools provide quick assessments, they are limited by language scope and bias in classifier models.
Integration Tests: These involve human evaluations of model outputs in conversational contexts to gauge performance metrics across safety dimensions with potential metrics like toxicity, sentiment analysis, and negation usage.

The testing toolkit, while indicative, is bounded by the specificity of available datasets and the dynamic nature of offensive and sensitive content across cultures and languages.

Implications and Future Directions

From a theoretical perspective, the paper reinforces the critical importance of value-sensitive design in AI development processes, urging continuous alignment with societal ethics and legal frameworks such as EU AI regulations. Practically, this research informs industry best practices, encouraging early-stage impact forecasting, strategic model training with adversarial examples, and using training datasets curated with values-sensitive tooling.

Future research pathways articulated by the authors include advancing models' Natural Language Understanding (NLU) capabilities to better contextualize dialog, enhancing adaptability through few-shot learning and inference-time control in dynamically changing environments, and developing more expansive, culturally aware evaluation benchmarks.

Overall, the paper calls for interdisciplinary collaboration and transparent dialog among stakeholders to ensure responsible evolution and deployment of conversational AI, fostering systems that are progressively robust and adaptable to the dynamic societal landscapes they operate within.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Emily Dinan (28 papers)
Gavin Abercrombie (17 papers)
A. Stevie Bergman (6 papers)
Shannon Spruit (2 papers)
Dirk Hovy (57 papers)
Y-Lan Boureau (26 papers)
Verena Rieser (58 papers)

Citations (102)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos