I2I-STRADA: Structured Data Analysis
- I2I-STRADA is a modular agentic architecture that deconstructs data analysis into goal interpretation, contextual grounding, abstract planning, and adaptive execution.
- It enhances planning coherence and insight alignment by mimicking expert human reasoning and enabling dynamic re-planning through a structured workflow.
- Benchmark evaluations on DABstep and DABench show significant improvements, with 8–12 point gains in coherence and alignment over prior models.
I2I-STRADA (Information-to-Insight via Structured Reasoning Agent for Data Analysis) is a modular agentic architecture for automating complex data analysis through explicit modeling of human-like analytical reasoning steps. Unlike traditional LLM-based agentic frameworks that treat the reasoning process as a series of black-box calls, I2I-STRADA decomposes analysis into four cognitive sub-tasks—goal interpretation, contextual grounding, abstract planning, and adaptive execution. This structured approach provides enhanced planning coherence and insight alignment, as demonstrated on DABstep and DABench benchmarks, and reflects expert data analysis workflows in machine-interpretable form (Sundar et al., 23 Jul 2025).
1. Motivation: Limitations of Orchestration-Based Systems
Existing agentic data analysis systems frequently rely on multi-agent orchestrations and procedural automation, enabling operations such as query translation, data transformation, and visualization. However, these systems typically:
- Treat the LLM as a monolithic or black-box solver, dispatched with isolated queries, code, or subtasks, without encoding the expert workflow underlying data analysis.
- Fail to address key cognitive stages: over-generalizing or under-specializing free-form user queries (misinterpretation of goals), disregarding data-specific rules (lack of contextual grounding), generating incoherent or ill-formed multi-step plans, and being unable to adaptively re-plan when initial assumptions are invalidated.
- Lack explicit modules for steps such as intent extraction, domain knowledge incorporation, structured planning, or dynamic correction, leading to brittle and opaque agentic behavior (Sundar et al., 23 Jul 2025).
I2I-STRADA addresses these deficiencies by formalizing the reasoning pipeline, ensuring each analytical sub-task is both modular and transparent.
2. Architectural Modules and Formal Specification
I2I-STRADA operationalizes the analytical workflow through four core, composable modules, each corresponding to a human-analyst subtask:
2.1 Goal Interpreter
- Function: Extracts high-level intent, entities, and constraints from the natural language query .
- Formal output: Initial belief state , where is the set of entities and is the set of constraints.
- Algorithmic sketch: Parses to output .
2.2 Knowledge Grounder
- Function: Grounds the initial belief state in contextual metadata and Standard Operating Procedures (SOPs) .
- Formal output: Grounded belief .
- Mechanism: Matches relevant metadata and SOP rules to , accumulating necessary contextual constraints.
2.3 Abstract Planner
- Function: Constructs a high-level strategy, sequencing abstract steps 0, each reflecting a task template (e.g., “compute summary statistics”).
- Formal output: 1.
- Approach: Identifies relevant sub-tasks and orders them based on the grounded belief.
2.4 Execution Adapter
- Function: Executes and, if necessary, adaptively revises the proposed plan on the actual data 2.
- Mechanism: For each abstract step 3, generates executable tools or code, observes intermediate results, and updates context 4 with a correction loop for unfinished tasks.
- Formal specification: 5.
These modules compose a hierarchical and adaptive reasoning pipeline, enacting a sequence of state transitions 6, mapping from free-form query to final response.
3. Structured Reasoning Workflow
The overall workflow is a state-transition system parameterized by composable module transformations:
7
Each 8 operator realizes the corresponding cognitive transformation, with the Execution Adapter’s feedback loop (9) enabling dynamic re-planning and failure recovery.
A plausible implication is that such modularization fosters progressive abstraction (filtering and refining context at each step) and multi-step refinement (allowing for iterative corrections aligned with expert practice).
4. Quantitative Evaluation: Metrics and Empirical Results
4.1 Metrics
I2I-STRADA introduces two principal metrics for structured agentic evaluation:
| Metric | Definition | Score Range |
|---|---|---|
| Coherence | 0, edit distance between predicted and gold plans | 1 |
| Insight Alignment | 2, 3 score over insight units | 4 |
- Coherence rewards plans that closely match gold-standard abstract step sequences.
- Alignment quantifies overlap between generated and gold insights, balancing recall and precision.
4.2 Benchmark Results
Empirical evaluation on DABstep and DABench benchmarks demonstrates I2I-STRADA’s superiority over prior models:
| System | Coherence | Alignment |
|---|---|---|
| I2I-STRADA | 0.82 | 0.79 |
| Prior Model A | 0.75 | 0.68 |
| Prior Model B | 0.70 | 0.64 |
- On DABstep: coherence 0.84 (I2I-STRADA) vs. 0.76 (Model A), alignment 0.81 vs. 0.69 (5).
- On DABench: coherence 0.80 (I2I-STRADA), alignment 0.77, with 8–12 point improvements over baselines (Sundar et al., 23 Jul 2025).
These gains span both simple statistical summaries and complex machine learning workflows, reflecting improved adherence to human-like analytical processes.
5. Advantages of Structured Cognitive Workflows
Explicit separation of cognitive steps enables:
- Progressive abstraction: Each module removes irrelevant detail while preserving critical context.
- Multi-step refinement: Adaptive execution allows localized error correction through dynamic re-planning, rather than restarting the entire analytical process.
- Improved transparency and interpretability: Each sub-task is exposed, facilitating oversight and human-in-the-loop adjustment.
- Higher-fidelity planning and insight extraction: Consistency in abstract reasoning yields more precise, relevant, and complete analytic outputs.
This approach distinguishes I2I-STRADA from flat, black-box orchestration pipelines, supporting robust insight extraction and reliable data-driven decision-support.
6. Prospective Enhancements and Research Directions
Future avenues for I2I-STRADA include:
- Hierarchical knowledge integration: Incorporation of domain ontologies and knowledge graphs into knowledge grounding to enhance semantic context modeling.
- Learning-to-plan: Application of reinforcement learning to improve the sequencing and selection of planning templates in the Abstract Planner.
- Real-time streaming support: Extension of the Execution Adapter to manage live, continuously updating data feeds and incremental computation.
- Human-in-the-loop steering: Provision for analysts to inspect, intervene, and adjust belief states (6) and subplans (7) at intermediate steps, facilitating collaborative analytics.
The modularization of cognitive tasks in I2I-STRADA thus provides a foundation for future advances in agentic data analysis that are robust, interpretable, and closely aligned with expert reasoning processes (Sundar et al., 23 Jul 2025).