Sola AI Agent for Open-World ISPM

Updated 18 January 2026

Sola AI Agent is an autonomous system designed for open-world decision-making and data-driven reasoning in identity security posture management.
It features a five-layer architecture that integrates natural-language parsing, dynamic schema retrieval, multi-step reasoning, tool execution, and evidence synthesis.
The agent employs embedding-based example retrieval and Tree-of-Thought planning to ensure auditability and robust cross-platform integration for real-time security insights.

A Sola AI Agent denotes a class of autonomous, agentic AI systems designed and benchmarked primarily for open-world, continual decision-making and data-driven reasoning, with concrete instantiations in security posture management and self-initiated open-world learning. The most prominent instantiation is the Sola AI Agent for Identity Security Posture Management (ISPM), introduced in conjunction with the Sola-Visibility-ISPM Benchmark, where Sola denotes “Self-Initiated Open-World Learning Agent” principles applied to foundational enterprise security tasks. The design philosophy is rooted in data-grounded, tool-using architectures with explicit natural-language (NL) to executable data exploration capability, continuous schema grounding, and verifiable, evidence-backed outputs (Engelberg et al., 11 Jan 2026). Theoretical frameworks for Sola agents emphasize autonomous detection of novelties, self-driven label acquisition, and incremental adaptation as outlined by the SOL/SOLA paradigms (Liu et al., 2021, Liu et al., 2022).

1. Architectural Framework and Layered Design

A Sola AI Agent is structured as a five-layer stack reflecting a progression from natural-language interpretation to evidence-backed synthesis:

Natural-Language Interpretation: Parses user ISPM queries, classifying intent (e.g., inventory vs. hygiene checks) and target platforms (AWS, Okta, Google Workspace).
Retrieval Layer: Dynamically acquires platform schema definitions, examples of historical SQL/QA patterns, and scaffold queries, filtered according to live metadata.
Reasoning Engine: Selects between fast-path template adaptation and explicit multi-step, Tree-of-Thought–decomposition, maintaining a journal of intermediate steps and validations.
Tool Execution: Generates and executes SQL/API calls against AWS, Okta, and Google Workspace, leveraging normalized tabular representations via an embedded query engine (e.g., DuckDB).
Evidence Synthesis: Aggregates platform-specific results, producing natural-language answers with citations to exact queries and returned data rows (Engelberg et al., 11 Jan 2026).

This stack is tightly coupled with a schema-grounded orchestration layer—“Identity Data Connector”—that normalizes disparate API and platform outputs, supporting cross-platform data joins and consolidated identity views.

2. Natural-Language–to–Data-Exploration Pipeline

The agent’s data-grounded reasoning pipeline is algorithmically formalized as follows:

Query Parsing: ISPM questions are parsed and classified, extracting target entities and ISPM dimension.
Schema and Example Retrieval: The agent loads relevant platform schemas, filtering example SQL queries for high structural compatibility.
Execution Mode Selection: Using an embedding-based schema-template similarity, mode selection defaults to fast-path when similarity exceeds a data-driven threshold ( $\tau$ ); otherwise full-path decomposition is invoked.
Full-Path Decomposition: The question is broken into sub-queries $S_1,\ldots,S_n$ , each step generating an SQL statement, executing it, and evaluating explicit step-wise success criteria. Failed criteria induce further decomposition or plan refinement.
Aggregation and Synthesis: All intermediate and final results are merged, with a natural-language answer generated that exposes supporting queries and rows. A full-path “step journal” captures the entire thought process and execution trace (Engelberg et al., 11 Jan 2026).

Pseudocode found in the primary source formalizes this pipeline, with algorithmic blocks for plan generation, SQL synthesis, tool execution, stepwise validation, and evidence consolidation.

3. Data Sources, Schema Management, and Cross-Platform Correlation

Sola AI Agents are engineered for multi-platform identity and security environments:

AWS IAM: Tables include IAM users, roles, groups, customer- and AWS-managed policies, access keys, MFA settings, password policies, and CloudTrail-derived logs.
Okta: Includes user/group inventories, sign-on/MFA policies, application assignments, and IdP (SAML) configuration.
Google Workspace: Covers users, organizational units, Super-Admin flags, 2-Step Verification status, OAuth allow-lists, and Drive-sharing.
Global Identity Directory: Enables correlation for entities (e.g., by email, SAML subject) across schemas, supporting queries such as “Which users lack MFA in any system?” via schema-agnostic linking (Engelberg et al., 11 Jan 2026).

Upon each query, the agent loads the latest schema metadata, ensures SQL generation conforms to platform constraints, and normalizes query results to enable evidence-backed multi-platform answers.

4. Benchmarking, Evaluation Metrics, and Quantitative Results

The Sola Visibility ISPM Benchmark consists of 77 natural-language tasks partitioned into Inventory and Hygiene (AWS, Okta, GWS) domains, each grounded in production-grade enterprise identity environments. Core metrics include:

Expert Accuracy: Ordinal, per-question metric: $\text{accuracy} = (1/N) \sum_{i=1}^N \text{score}_i$ , where $\text{score}_i \in \{0, 0.5, 1\}$ from five human experts.
Expert Success Rate: Strict proportion of perfectly answered questions: $\text{success rate} = (\#\{\text{score}_i=1\})/N$ .
LLM-as-Judge Correctness: Parallel (min-pooled) scoring by LLM judges.
Sub-metrics: In full-path, additional criteria include AnswerRelevancy, Faithfulness, ReasoningCoherence, SQLSemanticAppropriateness, and ExampleAdaptation (Engelberg et al., 11 Jan 2026).

Key results (Table 1, main text):

Domain	Expert Accuracy	Expert Success Rate
AWS Hygiene	0.95	0.90
Google Workspace	0.75	0.71
Okta	0.65	0.50
Inventory	0.75	0.64
Overall (77 Qs)	0.84	0.77
LLM Judgment	0.82	—

Performance is maximized in AWS hygiene tasks, reflecting schema regularity and retrieval strength. GWS and Okta hygiene require more robust example adaptation due to schema heterogeneity.

5. Core Algorithms: Reasoning, Planning, and Adaptation

Sola AI Agents leverage two principal algorithmic bases:

Embedding-Based Example Similarity: Retrieval of near-example SQL queries is driven by vector similarity computed between NL query embeddings and historical patterns.
Tree-of-Thought–Style Planning: For divergent or complex queries, multi-step reasoning iteratively decomposes questions, aligns SQL generation with schema semantics, and journals all steps—each checked against explicit success criteria before proceeding.
Evidence Aggregation and Verification: The final answer synthesizes information from all platforms, with citations to each SQL and returned evidence row, supporting auditability (Engelberg et al., 11 Jan 2026).

Fast-path reasoning enables low-latency adaptation for frequent/homogeneous questions, while the full-path approach guarantees fine-grained interpretability and alignment in complex, multi-platform queries.

6. Limitations and Prospective Research Directions

While Sola AI Agents set a new standard for data-grounded agentic reasoning in ISPM, certain limitations are noted:

Current Scope: Restricted to inventory and hygiene (visibility) questions; complex risk analytics, behavioral modeling, and active mitigation remain out of scope.
Schema Heterogeneity: Variance in GWS and Okta schema adaptation can degrade performance, with a 0.32 ExampleAdaptation score under full-path for GWS.
Auditability Limitation: Fast-path runs lack a stepwise reasoning trace.
Generality: The deployment and evaluation are presently constrained by the platforms and schemas with robust API/data access (Engelberg et al., 11 Jan 2026).

Future work targets multi-tenant and advanced governance tasks, cross-system behavioral reasoning, RL-based mode selection (fast/full-path) under latency constraints, higher-quality example retrieval through fine-tuning, and extension to identity sources beyond AWS/Okta/GWS, such as Azure AD and SaaS applications.

7. Theoretical and Broader Agentic Context

The design and operation of Sola AI Agents align with the broader SOL (“Self-Initiated Open-World Learning”) and SOLA (“Self-Initiated Open-World Continual Learning and Adaptation”) paradigms (Liu et al., 2021, Liu et al., 2022):

Autonomy: Sola agents self-initiate incremental learning cycles, continually detecting and characterizing novelty in non-i.i.d. environments.
Continual Adaptation: Instead of batch retraining, agents employ interaction modules for on-the-fly ground-truth acquisition and incremental model updates (e.g., gradient-based with rehearsal or regularization).
Agentic Capabilities: The architecture supports lifelong operation, open-world skill extension, and mixed-initiative interactions for label or clarification acquisition, with risk evaluation mechanisms to mediate exploratory adaptation in safety-critical contexts.

A plausible implication is that Sola AI agents represent a robust prototype for future agentic systems required to address high-stakes, open-world reasoning and data-driven exploration tasks, integrating security, safety, and continual adaptability. Sola’s deployment in ISPM constitutes a reference implementation, but the broader architectural and algorithmic template is suitable for domains including service robotics, autonomous navigation, and conversational agents (Liu et al., 2021, Liu et al., 2022).