Viability of LLMs as a Foundation for Agentic AI in Real Business Use Cases

Determine whether large language models (LLMs) provide a viable foundation for agentic artificial intelligence systems deployed in real business use cases, by assessing their reliability, security, and performance when embedded in multi-step, enterprise-context applications.

Background

The paper argues that effective agentic AI will depend less on general-purpose foundation model capabilities and more on application-level validation aligned with principal stakeholder objectives within complex enterprise settings. It highlights the compounding error risks of multi-step workflows and information-theoretic gaps between general pretraining and specific enterprise contexts.

Within this framing, the authors survey limitations of LLMs—including hallucinations, jailbreak vulnerabilities, limited context windows, and variable performance under updates—raising doubts about their suitability as the core reasoning engine in high-stakes, validated enterprise systems. They suggest that, with strong validation, simpler or more interpretable models may suffice, leaving open whether LLMs can meet the bar required for robust agentic AI in real deployments.

References

It remains to be seen if LLMs provide a viable foundation for agentic AI systems in real business use cases.

— Validity Is What You Need (2510.27628 - Benthall et al., 31 Oct 2025) in Section 5: The irony: with strong validation, why the foundation?

Viability of LLMs as a Foundation for Agentic AI in Real Business Use Cases

Sponsor

Background

References

Related Problems