Temporal CoT Prompting in Streaming LLMs

Updated 22 January 2026

Temporal Chain-of-Thought prompting is a dynamic method that incrementally updates reasoning in large language models under streaming and memory constraints.
It employs heuristic filtering based on correctness and depth to select and truncate prompt exemplars, ensuring compliance with strict token limits.
Experiments reveal that even with shallow or partially incorrect chains, the method achieves competitive performance compared to full-batch static strategies.

Temporal Chain-of-Thought (T-CoT) prompting refers to a formulation of Chain-of-Thought (CoT) prompting for LLMs under streaming or sequential batch conditions, in which data arrives incrementally and prompt construction must adapt as new batches are processed. Rather than assuming the availability of the entire test set for exemplar selection and rationale generation, T-CoT approaches treat the arrival order of batches as a streaming constraint on prompt maintenance and update. This paradigm captures a more realistic operational scenario for LLM deployment, where complete test data is not available a priori, and prompts must be updated dynamically while remaining within strict input length limitations (Tang, 2023).

1. Formalization of Streaming Batch CoT Prompting

The problem comprises a test set $D$ of size $|D|$ , partitioned into $m$ sequential batches, each with $N$ examples, to be processed by an LLM $M$ . The batch $k$ supplies questions $q_1^{(k)}, \ldots, q_N^{(k)}$ and starts with an initial fixed prompt $P_1$ . At each batch step $k$ , the model generates CoT rationales $c_i^{(k)} \leftarrow M(P_k \,\|\, q_i^{(k)})$ , forms question–rationale pairs $S_k = \{ (q_i^{(k)} \,\|\, c_i^{(k)} ) \}$ , and updates the prompt:

$P_{k+1} = f(P_k \mid S_k)$

where $f$ is a black-box function subject to the constraint $|P_k| \leq L_{\text{max}}$ (model input length). The objective is to choose or learn $f$ so as to maximize final test accuracy (or another relevant metric) after all $m$ batches:

$\max_f ~ \operatorname{Acc}( M(P_{m+1}, D) ) ~~\text{subject to}~~ \forall k,~ |P_k| \leq L_{\text{max}}$

This formulation grounds T-CoT in an incremental, memory-constrained regime that departs from conventional static or full-dataset CoT prompting (Tang, 2023).

2. Prompt Construction and Update Algorithms

In the streaming-batch setting, prompt construction requires sequential updating. The baseline (Auto-CoT) approach simply concatenates all new $(q \,\|\, c)$ pairs into the prompt at each step:

$P_{k+1} \leftarrow P_k \,\|\, S_k$

However, this method quickly breaches the $L_{\text{max}}$ constraint. Empirical heuristics are employed to select which $(q \,\|\, c)$ pairs are retained. Two principal criteria are investigated:

Correctness: Only retain $c_i^{(k)}$ that yield correct answers ("Correct-CoT"), or alternatively, intentionally retain >50% incorrect chains ("Wrong-CoT").
Depth (Rationale Length): Filter based on number of lines; "Deep-CoT" for rationales with $\#$ lines $\geq \xi$ , "Shallow-CoT" for $\#$ lines $< \xi$ .

After selection, the prompt is truncated as needed to respect $L_{\text{max}}$ . This procedure is summarized in the following pseudo-code:

for k = 1 to m:
    for i = 1 to N:
        c_i = M(P_k + q_i^(k))
    S_k = { (q_i^(k) + c_i) }
    S̃_k = select_subset(S_k; criterion = correctness or depth)
    P_{k+1} = truncate_to_max_length( P_k + S̃_k, L_max )

(Tang, 2023)

3. Temporal Structure and the Notion of “Temporal CoT”

Despite the temporal terminology, the approach does not introduce an explicit model of temporal dependency or time-decay across batches. The only modeled temporal aspect is the sequential index $k$ , with earlier batches contributing exemplars that persist or are discarded in subsequent prompt updates. There is no inter-batch memory, cross-batch relational modeling, or explicit tracking of temporal drift. The accumulation and pruning of exemplars is solely governed by input length and heuristic selection, rather than dynamic or learned temporal mechanisms. In effect, “temporal order” is equivalent to the sequential batch index and prompt accumulation under streaming constraints, not a learned or inferentially modeled temporal chain (Tang, 2023).

4. Experimental Setup and Quantitative Findings

Experiments are conducted using OpenAI text-davinci-002 on four datasets, each divided into 10 streaming batches: MultiArith (arithmetic, batch size 60), GSM8K (arithmetic, 64), StrategyQA (commonsense, 32), and Letter (symbolic, 81). Baselines include Zero-Shot-CoT (single “Let’s think step by step” prompt) and bootstrap Auto-CoT. Heuristic variants of $f$ are evaluated: Correct-CoT vs. Wrong-CoT, Deep-CoT vs. Shallow-CoT.

Main findings:

Wrong-CoT: Prompts containing more than half incorrect chain-of-thought examples suffer minimal degradation in performance compared to Correct-CoT.
Shallow-CoT: Shorter, shallower rationales outperform Deep-CoT, presumably due to reduced redundancy and token-efficiency under strict token budgets necessary as prompt size grows.
Both heuristics result in performance competitive with, and in certain cases surpass, the naive Auto-CoT baseline, while maintaining compliance with $L_{\text{max}}$ (Tang, 2023).

5. Limitations and Potential Extensions

Substantial limitations characterize the current formulation:

No Learned Selection Strategy: The prompt update heuristic $f$ is hand-crafted based on correctness or rationale length. More general approaches could learn to score and select exemplars, potentially using a policy network or reinforcement learning to optimize final accuracy.
Lack of Inter-Batch Memory: Exemplar maintenance is limited to a flat list; there is no mechanism for time-decayed memory, clustering, or retrieval-based pools that could better capture distributional drift of incoming questions.
Static Heuristics: The correctness and depth thresholds are fixed and not adapted based on validation data or batch performance.
No Rich Temporal Modeling: The streaming framework does not exploit possible inter-chain or cross-batch dependencies that may arise in temporally drifting or context-evolving data. A more fully realized “temporal CoT” method would model such correlations, track latent state, or permit cross-batch reference (Tang, 2023).

A plausible implication is that extending streaming-batch CoT to incorporate learned, adaptive prompt update mechanisms and richer temporal dependencies could address current deficits and improve reasoning performance in dynamic, non-stationary environments.

6. Contextualization within Chain-of-Thought Prompting Research

The streaming batch setting underscores a practical distinction from previous CoT methods, where full test set visibility and offline prompt optimization are assumed. Prior works such as Auto-CoT employed static, full-batch selection strategies unsuited to incremental or deployment contexts. The streaming approach of Tang et al. foregrounds the challenge of balancing prompt informativeness, redundancy, and length within strict limits, while exhibiting that even minimal heuristic filtering can maintain—or in the case of shallow rationales, enhance—accuracy. These findings motivate further research into adaptive, temporally aware prompt maintenance for robust LLM reasoning in real-world continuous data settings (Tang, 2023).

Markdown Report Issue Upgrade to Chat

References (1)

Chain-Of-Thought Prompting Under Streaming Batch: A Case Study (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Temporal Chain-of-Thought (T-CoT) Prompting.

Temporal CoT Prompting in Streaming LLMs

1. Formalization of Streaming Batch CoT Prompting

2. Prompt Construction and Update Algorithms

3. Temporal Structure and the Notion of “Temporal CoT”

4. Experimental Setup and Quantitative Findings

5. Limitations and Potential Extensions

6. Contextualization within Chain-of-Thought Prompting Research

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Temporal CoT Prompting in Streaming LLMs

1. Formalization of Streaming Batch CoT Prompting

2. Prompt Construction and Update Algorithms

3. Temporal Structure and the Notion of “Temporal CoT”

4. Experimental Setup and Quantitative Findings

5. Limitations and Potential Extensions

6. Contextualization within Chain-of-Thought Prompting Research

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research