Session Risk Memory (SRM)
- Session Risk Memory (SRM) is a deterministic session-level mechanism that combines per-action spatial checks with a temporal risk accumulator.
- It updates a compact internal state using semantic centroids and risk signals per turn to detect gradual attacks like slow-burn exfiltration and privilege escalation.
- SRM integrates seamlessly with existing stateless gates, achieving complete detection with minimal computational overhead and eliminating false positives.
Session Risk Memory (SRM) is a deterministic, trajectory-level authorization mechanism that extends stateless pre-execution safety gates to provide robust defenses against distributed, multi-step policy violations in agentic systems. While conventional authorization gates enforce spatial consistency—verifying whether individual actions align with assigned roles—SRM augments these systems with a temporal risk accumulator that captures evolving behavioral patterns across the entire agent session. This hybrid design enables the detection and mitigation of gradual, composite attacks that evade per-action thresholds, such as slow-burn data exfiltration, stepwise privilege escalation, and compliance drift. SRM achieves session-level safety with minimal computational overhead and without the need for additional model training, probabilistic inference, or fine-tuning (Chitan, 22 Mar 2026).
1. Motivation and Conceptual Framework
Stateless, per-action authorization gates such as ILION operate by evaluating each proposed agent action against semantic thresholds derived from its role, ensuring that overtly malicious or out-of-policy actions are blocked with deterministic, low-latency vetting. However, attackers can split harmful intent across a series of individually compliant steps—each benign in isolation—bypassing these stateless gates until the final harmful act is reached. SRM addresses this structural blind spot by introducing a session-level memory module that accumulates trajectory risk over time and flags sessions for intervention once the aggregate risk exceeds a predefined threshold. This introduces a conceptual distinction between spatial consistency (per-action) and temporal consistency (over the action sequence), furnishing a principled basis for “defense in depth” (Chitan, 22 Mar 2026).
2. Internal State, Equations, and Update Mechanism
SRM maintains a compact internal state, updating at each turn in the agent session according to deterministic, interpretable equations:
- Semantic Centroid: , a smoothed vector centroid summarizing recent behavior (for ILION, ), updated as , where is the semantic embedding of the current action, and is the smoothing factor (empirically ).
- Session Baseline Risk: , computed as an exponential moving average of the raw gate risk over the first turns () and frozen thereafter; 0, 1.
- Risk Accumulation: 2, the exponentially averaged, baseline-subtracted risk, 3, 4.
- Gate Risk Signal: 5, scoring deviations from safety across multiple semantic dimensions.
- Session Flag: Once 6 (7), the session is irreversibly flagged.
The drift signal 8 is included for extensibility, though in the ILION 21-D embedding it is saturated and not currently weighted (9) (Chitan, 22 Mar 2026).
3. Integration with Deterministic Gates and System Operation
SRM is integrated with stateless agent gates such as ILION via a deterministic, non-intrusive update sequence per agent turn:
- Embed the action 0 to obtain 1.
- Pass 2 to the ILION gate, collecting gate scores (CVL, IDC, IRS, SVRF).
- Block the action immediately on any stateless veto (spatial consistency).
- Compute the raw risk 3.
- Update or freeze the risk baseline 4.
- Compute the baseline-subtracted risk 5.
- Optionally compute the drift signal 6.
- Calculate the per-turn session risk 7.
- Update the accumulated risk 8 and the session centroid 9. 10. Flag the session if 0 (temporal consistency).
SRM preserves the stateless gate’s immediate blocking of overtly unsafe actions while providing a temporal window to detect subtle escalation and distributed threats (Chitan, 22 Mar 2026).
4. Spatial versus Temporal Authorization Consistency
SRM formalizes the distinction between:
- Spatial Consistency: Stateless gates ask, “Is this single action compatible with the agent’s role?”—effectively capturing overt violations and malicious payloads.
- Temporal Consistency: SRM asks, “Is the evolving session trajectory as a whole consistent with the agent’s assigned role?”—thereby capturing “slow-burn” attacks, privilege creep, and policy circumvention patterns that unfold over multiple steps.
Spatial authorization alone is blind to distributed threats; temporal authorization in isolation risks excessive false positives. By combining both, the framework provides two-dimensional, principled defense with interpretable, real-time decisions (Chitan, 22 Mar 2026).
5. Empirical Performance and Computational Properties
Evaluation on a benchmark of 80 multi-turn agent sessions—spanning slow-burn exfiltration, gradual privilege escalation, and compliance drift—demonstrates:
| System | Detection Rate (Recall) | False Positive Rate (FPR) | F1 Score | Per-Turn Overhead |
|---|---|---|---|---|
| ILION | 100% | 5% | 0.9756 | < 1 ms |
| ILION + SRM | 100% | 0% | 1.0000 | ~240 μs |
SRM eliminates all false positives while preserving sub-millisecond per-turn latency (dominated by embedding lookup and cosine calculation, 1 complexity), fulfilling real-time system constraints. This gain is achieved without model retraining or probabilistic inference (Chitan, 22 Mar 2026).
6. Limitations and Future Research Directions
Key limitations identified include:
- Embedding Expressivity: The 21-D keyword vector saturates the drift metric; the drift signal is retained for future high-dimensional, continuous embeddings (e.g., transformer-based representations).
- Structured Action Syntax: The benchmark uses structured sessions; real-world deployment will face more complex agent workflows, including branching, parallelism, and free-form text.
- Warmup Trade-Offs: Baseline subtraction eliminates false positives but ignores first-turn escalations; stateless gates still catch overt first-turn attacks, but adaptive alternatives may be considered.
- Extensibility: Future work is proposed on leveraging rich embeddings (enabling 2), supporting session graphs, adaptive or role-specific thresholds, and scaling up evaluation (Chitan, 22 Mar 2026).
A plausible implication is that SRM’s deterministic and interpretable approach makes it attractive as a front-end session-level safeguard, especially in enterprise environments where interpretability and low-latency risk management are critical.
7. Relation to Cross-Session and Retrieval-Augmented Session Risk
Session-level risk modeling also appears in retrieval-augmented LLM risk detectors, notably CS-VAR, which addresses risk by aggregating evidence across multiple sessions in domains such as live streaming. CS-VAR employs a graph-transformer backbone with cross-session memory and LLM-guided teacher distillation to surface recurring risk patterns and scam chains in real time (Qiao et al., 22 Jan 2026). However, CS-VAR’s strategy is not deterministic nor stateless—it combines learned sequence modeling, attention over patch grids, evidence retrieval via FAISS, and LLM-based knowledge transfer. Both SRM and CS-VAR foreground session-level trajectory modeling, but SRM emphasizes deterministic, interpretable, and deployable logic, while CS-VAR leverages LLM-augmented cross-session reasoning for risk detection in high-volume streaming environments (Chitan, 22 Mar 2026, Qiao et al., 22 Jan 2026).