Ambiguity of disclaimer sufficiency for OpenAI Model Spec compliance

Determine whether including disclaimers in responses to potentially sensitive user requests is sufficient for compliance with the OpenAI Model Spec, specifically clarifying the conditions under which disclaimers render a response "sufficiently safe" for such scenarios to resolve the noted ambiguity in compliance judgments.

Background

Within the qualitative analysis of compliance checks, the authors present cases where evaluator models could not decisively judge compliance with the OpenAI Model Spec. One highlighted ambiguity concerns whether adding disclaimers to responses addressing potentially sensitive requests suffices to make them compliant.

This uncertainty reflects broader interpretive gaps in specification language, which lead both responding models and evaluators to inconsistent conclusions. Clarifying the role and sufficiency of disclaimers would reduce disagreement and improve reliability of compliance assessments across models.

References

Figure~\ref{fig:model_spec_ambiguous} demonstrates cases where judge models cannot definitively assess specification compliance, revealing fundamental specification ambiguities. In this example, Claude 4 Sonnet cannot determine whether responses comply with the OpenAI model specification, with the central ambiguity revolving around whether disclaimers constitute sufficiently safe responses to potentially sensitive requests.

— Stress-Testing Model Specs Reveals Character Differences among Language Models (2510.07686 - Zhang et al., 9 Oct 2025) in Section 4.1, Model Spec Compliance Checks — Ambiguous compliance determination by compliance check judge model

Ambiguity of disclaimer sufficiency for OpenAI Model Spec compliance

Sponsor

Background

References

Related Problems