Learning When to Act: Interval-Aware Reinforcement Learning with Predictive Temporal Structure

Published 23 Mar 2026 in cs.LG and cs.AI | (2603.22384v2)

Abstract: Autonomous agents operating in continuous environments must decide not only what to do, but when to act. We introduce a lightweight adaptive temporal control system that learns the optimal interval between cognitive ticks from experience, replacing ad hoc biologically inspired timers with a principled learned policy. The policy state is augmented with a predictive hyperbolic spread signal (a "curvature signal" shorthand) derived from hyperbolic geometry: the mean pairwise Poincare distance among n sampled futures embedded in the Poincare ball. High spread indicates a branching, uncertain future and drives the agent to act sooner; low spread signals predictability and permits longer rest intervals. We further propose an interval-aware reward that explicitly penalises inefficiency relative to the chosen wait time, correcting a systematic credit-assignment failure of naive outcome-based rewards in timing problems. We additionally introduce a joint spatio-temporal embedding (ATCPG-ST) that concatenates independently normalised state and position projections in the Poincare ball; spatial trajectory divergence provides an independent timing signal unavailable to the state-only variant (ATCPG-SO). This extension raises mean hyperbolic spread (kappa) from 1.88 to 3.37 and yields a further 5.8 percent efficiency gain over the state-only baseline. Ablation experiments across five random seeds demonstrate that (i) learning is the dominant efficiency factor (54.8 percent over no-learning), (ii) hyperbolic spread provides significant complementary gain (26.2 percent over geometry-free control), (iii) the combined system achieves 22.8 percent efficiency over the fixed-interval baseline, and (iv) adding spatial position information to the spread embedding yields an additional 5.8 percent.

Abstract PDF Upgrade to Chat

Authors (1)

Davide Di Gioia

Summary

The paper introduces ATCPG, a novel framework that adjusts action intervals using predictive uncertainty signals from hyperbolic geometry.
The framework integrates a learned pacing policy and interval-aware rewards, achieving up to a 72.5% efficiency gain over fixed-interval methods.
Its lightweight design allows practical integration into digital assistants, robotic planners, and multi-agent systems with minimal computational overhead.

"Learning When to Act: Interval-Aware Reinforcement Learning with Predictive Temporal Structure"

Introduction

The paper "Learning When to Act: Interval-Aware Reinforcement Learning with Predictive Temporal Structure" addresses a pivotal issue in autonomous agent design: the necessity of learning not only what actions to take but also when to act. This is formalized as "adaptive cognitive pacing," which balances computational efficiency against the risk of missed environmental changes. The paper introduces the Adaptive Temporal Control via Predictive Geometry (ATCPG) framework, a novel reinforcement learning (RL) approach that autonomously adjusts action intervals based on predictive uncertainty signals derived from hyperbolic geometry.

Framework Overview

The ATCPG framework is built on four interconnected components:

Learned Pacing Policy: A linear associative bandit model that adjusts the cognitive interval using reward-weighted regression, dynamically updating the frequency of agent deliberation.
Predictive Hyperbolic Spread: This component uses the Poincaré ball model to quantify divergence in predicted future trajectories, allowing the agent to determine when to act based on the perceived volatility of imagined futures.
Interval-Aware Reward: An enhanced reward mechanism that addresses traditional credit-assignment failures in RL, incorporating efficiency and uncertainty metrics to optimize pacing decisions.
Joint Spatio-Temporal Embedding (ATCPG-ST): Augments policy states with spatial trajectory information, improving efficiency by considering the divergence of position trajectories alongside state predictions.

Experimental Results

Empirical validation across numerous experiments demonstrates the efficacy of ATCPG. Key findings include:

A 54.8% increase in efficiency attributed to the learning mechanism as the dominant factor.
26.2% efficiency gains from integrating hyperbolic spread, exceeding non-geometric approaches.
22.8% improvement over fixed-interval baselines when employing the full system.
The joint spatio-temporal embedding (ATCPG-ST) raises efficiency by an additional 5.8%.

Ablation studies confirm the independent contributions of each component, with the framework's efficiency rooted in its ability to leverage predictive geometry and interval-aware rewards to optimize decision timing.

Comparative Analysis

The paper contrasts ATCPG's geometric approach with a TemporalController baseline privileged with direct overload signals. Despite the latter's cleaner reward-to-state correlation, ATCPG achieves a 72.5% efficiency advantage due to the amplified divergence inherent in hyperbolic geometry, underscoring ATCPG's applicability in realistic, partially observable environments.

Practical Implications and Future Directions

ATCPG's lightweight design, requiring minimal computational overhead, enables its integration into agent systems with existing world model predictions. This is especially beneficial for digital assistants, robotic planners, and multi-agent systems with non-trivial communication costs. However, the paper also outlines several areas for future research, such as evaluating ATCPG on large-scale benchmarks and refining world models for practical deployment in language-model-based systems.

Conclusion

In summary, "Learning When to Act" introduces a comprehensive framework for adaptive cognitive pacing in RL, leveraging predictive geometry to optimize intervals for agentic decision-making. Through a blend of theoretical innovation and empirical rigor, the paper demonstrates substantial efficiency improvements over traditional methods, paving the way for future advancements in autonomous agent behavior. By treating temporal pacing as an integral element of agent design, ATCPG establishes a new paradigm in the governance of decision frequency in computationally constrained environments.

Markdown Report Issue