I2I-STRADA: Structured Data Analysis

Updated 11 May 2026

I2I-STRADA is a modular agentic architecture that deconstructs data analysis into goal interpretation, contextual grounding, abstract planning, and adaptive execution.
It enhances planning coherence and insight alignment by mimicking expert human reasoning and enabling dynamic re-planning through a structured workflow.
Benchmark evaluations on DABstep and DABench show significant improvements, with 8–12 point gains in coherence and alignment over prior models.

I2I-STRADA (Information-to-Insight via Structured Reasoning Agent for Data Analysis) is a modular agentic architecture for automating complex data analysis through explicit modeling of human-like analytical reasoning steps. Unlike traditional LLM-based agentic frameworks that treat the reasoning process as a series of black-box calls, I2I-STRADA decomposes analysis into four cognitive sub-tasks—goal interpretation, contextual grounding, abstract planning, and adaptive execution. This structured approach provides enhanced planning coherence and insight alignment, as demonstrated on DABstep and DABench benchmarks, and reflects expert data analysis workflows in machine-interpretable form (Sundar et al., 23 Jul 2025).

1. Motivation: Limitations of Orchestration-Based Systems

Existing agentic data analysis systems frequently rely on multi-agent orchestrations and procedural automation, enabling operations such as query translation, data transformation, and visualization. However, these systems typically:

Treat the LLM as a monolithic or black-box solver, dispatched with isolated queries, code, or subtasks, without encoding the expert workflow underlying data analysis.
Fail to address key cognitive stages: over-generalizing or under-specializing free-form user queries (misinterpretation of goals), disregarding data-specific rules (lack of contextual grounding), generating incoherent or ill-formed multi-step plans, and being unable to adaptively re-plan when initial assumptions are invalidated.
Lack explicit modules for steps such as intent extraction, domain knowledge incorporation, structured planning, or dynamic correction, leading to brittle and opaque agentic behavior (Sundar et al., 23 Jul 2025).

I2I-STRADA addresses these deficiencies by formalizing the reasoning pipeline, ensuring each analytical sub-task is both modular and transparent.

2. Architectural Modules and Formal Specification

I2I-STRADA operationalizes the analytical workflow through four core, composable modules, each corresponding to a human-analyst subtask:

2.1 Goal Interpreter

Function: Extracts high-level intent, entities, and constraints from the natural language query $Q$ .
Formal output: Initial belief state $\mathbf{B}_0 = (\text{intent}, E, C)$ , where $E$ is the set of entities and $C$ is the set of constraints.
Algorithmic sketch: Parses $Q$ to output $\mathbf{B}_0;\ \mathbf{B}_0 = \phi_\mathrm{interp}(Q)$ .

2.2 Knowledge Grounder

Function: Grounds the initial belief state in contextual metadata $M$ and Standard Operating Procedures (SOPs) $S$ .
Formal output: Grounded belief $\mathbf{B} = \phi_\mathrm{ground}(\mathbf{B}_0, M, S)$ .
Mechanism: Matches relevant metadata and SOP rules to $\mathbf{B}_0$ , accumulating necessary contextual constraints.

2.3 Abstract Planner

Function: Constructs a high-level strategy, sequencing abstract steps $\mathbf{B}_0 = (\text{intent}, E, C)$ 0, each reflecting a task template (e.g., “compute summary statistics”).
Formal output: $\mathbf{B}_0 = (\text{intent}, E, C)$ 1.
Approach: Identifies relevant sub-tasks and orders them based on the grounded belief.

2.4 Execution Adapter

Function: Executes and, if necessary, adaptively revises the proposed plan on the actual data $\mathbf{B}_0 = (\text{intent}, E, C)$ 2.
Mechanism: For each abstract step $\mathbf{B}_0 = (\text{intent}, E, C)$ 3, generates executable tools or code, observes intermediate results, and updates context $\mathbf{B}_0 = (\text{intent}, E, C)$ 4 with a correction loop for unfinished tasks.
Formal specification: $\mathbf{B}_0 = (\text{intent}, E, C)$ 5.

These modules compose a hierarchical and adaptive reasoning pipeline, enacting a sequence of state transitions $\mathbf{B}_0 = (\text{intent}, E, C)$ 6, mapping from free-form query to final response.

3. Structured Reasoning Workflow

The overall workflow is a state-transition system parameterized by composable module transformations:

$\mathbf{B}_0 = (\text{intent}, E, C)$ 7

Each $\mathbf{B}_0 = (\text{intent}, E, C)$ 8 operator realizes the corresponding cognitive transformation, with the Execution Adapter’s feedback loop ( $\mathbf{B}_0 = (\text{intent}, E, C)$ 9) enabling dynamic re-planning and failure recovery.

A plausible implication is that such modularization fosters progressive abstraction (filtering and refining context at each step) and multi-step refinement (allowing for iterative corrections aligned with expert practice).

4. Quantitative Evaluation: Metrics and Empirical Results

4.1 Metrics

I2I-STRADA introduces two principal metrics for structured agentic evaluation:

Metric	Definition	Score Range
Coherence	$E$ 0, edit distance between predicted and gold plans	$E$ 1
Insight Alignment	$E$ 2, $E$ 3 score over insight units	$E$ 4

Coherence rewards plans that closely match gold-standard abstract step sequences.
Alignment quantifies overlap between generated and gold insights, balancing recall and precision.

4.2 Benchmark Results

Empirical evaluation on DABstep and DABench benchmarks demonstrates I2I-STRADA’s superiority over prior models:

System	Coherence	Alignment
I2I-STRADA	0.82	0.79
Prior Model A	0.75	0.68
Prior Model B	0.70	0.64

On DABstep: coherence 0.84 (I2I-STRADA) vs. 0.76 (Model A), alignment 0.81 vs. 0.69 ( $E$ 5).
On DABench: coherence 0.80 (I2I-STRADA), alignment 0.77, with 8–12 point improvements over baselines (Sundar et al., 23 Jul 2025).

These gains span both simple statistical summaries and complex machine learning workflows, reflecting improved adherence to human-like analytical processes.

5. Advantages of Structured Cognitive Workflows

Explicit separation of cognitive steps enables:

Progressive abstraction: Each module removes irrelevant detail while preserving critical context.
Multi-step refinement: Adaptive execution allows localized error correction through dynamic re-planning, rather than restarting the entire analytical process.
Improved transparency and interpretability: Each sub-task is exposed, facilitating oversight and human-in-the-loop adjustment.
Higher-fidelity planning and insight extraction: Consistency in abstract reasoning yields more precise, relevant, and complete analytic outputs.

This approach distinguishes I2I-STRADA from flat, black-box orchestration pipelines, supporting robust insight extraction and reliable data-driven decision-support.

6. Prospective Enhancements and Research Directions

Future avenues for I2I-STRADA include:

Hierarchical knowledge integration: Incorporation of domain ontologies and knowledge graphs into knowledge grounding to enhance semantic context modeling.
Learning-to-plan: Application of reinforcement learning to improve the sequencing and selection of planning templates in the Abstract Planner.
Real-time streaming support: Extension of the Execution Adapter to manage live, continuously updating data feeds and incremental computation.
Human-in-the-loop steering: Provision for analysts to inspect, intervene, and adjust belief states ( $E$ 6) and subplans ( $E$ 7) at intermediate steps, facilitating collaborative analytics.

The modularization of cognitive tasks in I2I-STRADA thus provides a foundation for future advances in agentic data analysis that are robust, interpretable, and closely aligned with expert reasoning processes (Sundar et al., 23 Jul 2025).

Markdown Report Issue Upgrade to Chat

References (1)

I2I-STRADA -- Information to Insights via Structured Reasoning Agent for Data Analysis (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to I2I-STRADA.