Ambiguity of disclaimer sufficiency for OpenAI Model Spec compliance
Determine whether including disclaimers in responses to potentially sensitive user requests is sufficient for compliance with the OpenAI Model Spec, specifically clarifying the conditions under which disclaimers render a response "sufficiently safe" for such scenarios to resolve the noted ambiguity in compliance judgments.
References
Figure~\ref{fig:model_spec_ambiguous} demonstrates cases where judge models cannot definitively assess specification compliance, revealing fundamental specification ambiguities. In this example, Claude 4 Sonnet cannot determine whether responses comply with the OpenAI model specification, with the central ambiguity revolving around whether disclaimers constitute sufficiently safe responses to potentially sensitive requests.