Validity Is What You Need (2510.27628v1)

Published 31 Oct 2025 in cs.AI

Abstract: While AI agents have long been discussed and studied in computer science, today's Agentic AI systems are something new. We consider other definitions of Agentic AI and propose a new realist definition. Agentic AI is a software delivery mechanism, comparable to software as a service (SaaS), which puts an application to work autonomously in a complex enterprise setting. Recent advances in LLMs as foundation models have driven excitement in Agentic AI. We note, however, that Agentic AI systems are primarily applications, not foundations, and so their success depends on validation by end users and principal stakeholders. The tools and techniques needed by the principal users to validate their applications are quite different from the tools and techniques used to evaluate foundation models. Ironically, with good validation measures in place, in many cases the foundation models can be replaced with much simpler, faster, and more interpretable models that handle core logic. When it comes to Agentic AI, validity is what you need. LLMs are one option that might achieve it.

Summary

The paper asserts that robust, context-sensitive validation enhances Agentic AI efficacy by aligning models with stakeholder objectives.
It details a mechanism design-inspired approach that integrates context modeling, objective specification, and continuous feedback.
The study finds that smaller, interpretable models can outperform large foundation models in enterprise agentic workflows.

Validity-Centric Design in Agentic AI Systems

Introduction

"Validity Is What You Need" (2510.27628) critically examines the contemporary surge in Agentic AI, particularly the application of foundation models such as LLMs in enterprise contexts. The paper advances a realist definition of Agentic AI, positioning it as a software delivery mechanism akin to SaaS, which autonomously executes multi-step tasks within complex organizational environments. The central thesis is that the success of Agentic AI hinges not on the general capabilities of foundation models, but on rigorous validation processes that ensure alignment with stakeholder objectives and operational requirements. The authors argue that, paradoxically, robust validation may obviate the need for large foundation models, favoring smaller, interpretable, and more efficient models.

Definitional Landscape of Agentic AI

The paper systematically reviews definitions of "agency" from classical AI, legal theory, contemporary research, and industry. Classical AI frames agents as sensorimotor systems optimizing for specified goals, often formalized via the Bellman equation in RL. Legal perspectives emphasize the principal-agent relationship, focusing on the agent's obligation to act in the principal's interest, which introduces alignment and fiduciary challenges. Contemporary research and industry definitions highlight adaptability, autonomy, and multi-step reasoning in complex environments, often operationalized through LLM orchestration and tool integration.

The authors propose a realist definition: Agentic AI is a service delivery model that embeds multi-step AI tools into enterprise workflows, managing sociotechnical complexity through context-aware action sequences. This definition shifts the focus from abstract autonomy to situated, stakeholder-driven application.

Foundations vs. Applications: The Value Chain

The paper delineates the supply chain of Agentic AI applications, emphasizing the distinction between foundation model pretraining, finetuning, and user-specific adaptation. While foundation models offer economies of scale and generalization, their utility in enterprise Agentic AI is constrained by the specificity of stakeholder requirements and proprietary contexts. The authors highlight that the last mile—translating general model capabilities into situated value—is the most challenging and least amenable to generic solutions.

Information-Theoretic and Practical Limits

Three core challenges are identified:

Pretraining Information Gap: Foundation models, trained on general data, lack access to proprietary, context-specific information critical for enterprise deployment. This gap is exacerbated by model drift, supply chain shifts, and vendor lock-in, undermining reliability and necessitating continuous auditing.
Designer Knowledge Constraints: Application designers often lack full visibility into both the internals of foundation models and the evolving needs of principal stakeholders. Alignment is achieved through operational artifacts—tests, guardrails, finetuning datasets—that encode stakeholder objectives, but these require ongoing adaptation.
Stakeholder Confidence: End-user trust cannot be guaranteed by general model benchmarks; it must be earned through application-specific validation, verification, and transparent governance.

The paper asserts that multi-step agentic workflows compound error rates, making end-to-end validation and granular monitoring essential. The authors advocate for scenario-based validation, model drift detection, bias testing, robust guardrails, and adversarial robustness as critical components of the validation surface.

Agentic AI Application Design Heuristics

The authors propose a mechanism design-inspired process for Agentic AI development:

Context Modeling: Treat the enterprise as a multi-agent sociotechnical system, mapping stakeholders, incentives, resources, and constraints.
Objective Specification: Operationalize principal objectives within the modeled context, ensuring they are well-defined and measurable.
Feedback and Failure Analysis: Anticipate distributional shifts, data bias, and stakeholder behavioral changes induced by the agentic system; encode these as additional constraints and guardrails.
System Implementation: Select tools and models—foundation or otherwise—based on fit to requirements, not on general capability.
Validation and Iteration: Employ continuous monitoring, stakeholder feedback, and retraining to maintain alignment and performance.

This process foregrounds the primacy of application-level validation over model-centric evaluation, emphasizing the need for dynamic, context-sensitive governance.

Implications for Model Selection and System Architecture

A central claim is that strong validation processes diminish the necessity for large foundation models in Agentic AI. Smaller LLMs (SLMs), quantized models, or classical expert systems may offer superior interpretability, efficiency, and security in enterprise settings. The paper cites evidence of LLM vulnerabilities—jailbreaking, hallucinations, prompt injection, lack of confidentiality awareness, and subpar performance in coding assistance—arguing that these limitations undermine their suitability for high-stakes agentic applications.

Alternatives such as SLMs, utility-theoretic frameworks, dynamic programming, and graph-based reasoning are posited as more robust and cost-effective for many agentic use cases. The authors suggest that the excitement generated by foundation models will ultimately drive demand for skillful application of these established technologies.

Theoretical and Practical Implications

The paper challenges the prevailing foundation model-centric paradigm in Agentic AI, advocating for a shift toward application-centric validation and governance. This has several implications:

Research: Opens new avenues in application evaluation, mechanism design, and sociotechnical modeling, moving beyond benchmark-driven model assessment.
Industry: Encourages investment in validation infrastructure, stakeholder engagement, and modular system architectures that prioritize interpretability and reliability.
Future Developments: Anticipates a diversification of agentic AI architectures, with increased adoption of SLMs, hybrid systems, and domain-specific expert models. The emphasis on validation may drive standardization in auditing, monitoring, and feedback mechanisms.

Conclusion

"Validity Is What You Need" reframes Agentic AI as a validation-driven enterprise application paradigm, decoupling its success from the generality of foundation models. The paper argues that rigorous, context-sensitive validation processes are the critical determinant of agentic AI efficacy, and that smaller, interpretable models may ultimately supplant foundation models in mature deployments. This perspective foregrounds the importance of mechanism design, stakeholder alignment, and dynamic governance, setting a research agenda focused on application-level evaluation and robust system design.