Papers
Topics
Authors
Recent
2000 character limit reached

Smart-TCP: Adaptive Agentic TCP

Updated 6 December 2025
  • Smart-TCP is a protocol paradigm that integrates LLM-driven reasoning and deterministic ALU arithmetic to dynamically manage TCP state and control flags.
  • It employs modular context aggregation and autonomous decision loops to achieve high accuracy in state transitions and anomaly detection.
  • Comparative studies with FlexTOE and PnO-TCP highlight Smart-TCP's superior performance in sustaining stateful, error-resilient data transfers.

Smart-TCP refers to TCP protocol designs and systems that leverage agentic AI, modular offload, or similar context-aware paradigms to dramatically enhance the adaptability, offloadability, and intelligence of TCP’s transport logic. Recent efforts span agentic, LLM-driven protocol agents such as Smart-TCP (Han et al., 29 Nov 2025), flexible programmable data-plane offloads exemplified by FlexTOE (Shashidhara et al., 2021), and transparent full-stack off-path offload architectures like PnO-TCP (Nan et al., 29 Mar 2025).

1. Agentic AI-based Protocols: Smart-TCP Core Architecture

Smart-TCP reconceptualizes TCP state machine logic as an autonomous agent, unifying three synergistic building blocks: a Context Aggregation (CA) mechanism, an LLM-based reasoning module, and a deterministic Arithmetic Logic Unit (ALU) tool. Each endpoint (client or server) operates in an agentic decision loop:

  • Context Aggregation synthesizes the protocol state SS, the most recent incoming segment RR, and a local action AA (e.g., on_receive, timeout_event) into a structured JSON object.
  • The LLM receives this context and autonomously infers four fields: next internal state S′S', TCP control flags FF, payload length PLP_L, and a high-level ALU task descriptor TtaskT_{task}. This is achieved via a formal interface:

(S′,F,PL,Ttask)=LLLM(S,R,A)(S',F,P_L,T_{task}) = \mathcal{L}_{LLM}(S,R,A)

The LLM is fine-tuned (Llama3-8B, LoRA) with SFT targeting cross-entropy minimization on model predictions.

  • The ALU tool is invoked on each iteration for 32-bit arithmetic essential to sequence and acknowledgment computation. The ALU is responsible for tasks such as:

(Seq,Ack)=LALU(Ttask,S,R)(\mathrm{Seq}, \mathrm{Ack}) = \mathcal{L}_{ALU}(T_{task}, S, R)

Typical operations are "CALCULATE_ACK" and "CALCULATE_SEQ_ACK," precisely matching standard TCP arithmetic semantics.

Smart-TCP endpoints interact in a peer-to-peer, dual-agent setup, which effectively replaces the hard-coded state machine of RFC 9293 with LLM-guided agentic reasoning—executing the handshake, data transfer, and termination phases through context-aware segment emission and reaction. This design demonstrates the feasibility of agentic TCP implementations able to adapt, infer, and detect protocol anomalies in ways unattainable with static logic (Han et al., 29 Nov 2025).

2. Modular Context Aggregation and LLM-based Decision Process

The Context Aggregation (CA) mechanism is central: it marshals protocol context—comprising state, receive event, and action—into a structured, serializable prompt. This context is encoded in JSON and appended to the LLM’s input, ensuring unambiguous parsing of multi-field state.

In each reasoning cycle, the LLM operates as follows:

  • System prompt: Defines the role as "autonomous TCP inference engine"
  • User prompt: Injects JSON-encoded SS, RR, and AA
  • Output: Predicts a precisely structured JSON dictionary with {S′,F,PL,Ttask}\{S', F, P_L, T_{task}\}

The actioned ALU task is strictly arithmetic and deterministic, maintaining fidelity with 32-bit sequence number, acknowledgment, and window arithmetic—including correct wraparound and incremental updates.

Decision workflow (pseudocode):

1
2
3
4
5
6
7
8
9
def SmartTCP_agent_loop():
    while True:
        R = wait_for_incoming_segment_or_timer()
        CA_input = AggregateContext(S, R, A)
        (S', F, P_L, T_task) = LLM_reason(CA_input)
        (Seq, Ack) = ALU_compute(T_task, S, R)
        G = assemble_segment(Seq, Ack, F, P_L)
        send(G)
        S = S'

The fine-tuned LLM enables robust state and flag prediction, outperforming pure LLM baselines on both end-to-end accuracy and anomaly detection (Han et al., 29 Nov 2025).

3. Comparison: Offload Architectures (FlexTOE, PnO-TCP)

Other Smart-TCP paradigms focus on offloading TCP logic from the host, leveraging either on-path SmartNIC NPUs (FlexTOE) or full off-path DPUs (PnO-TCP).

FlexTOE (Shashidhara et al., 2021) divides control- and data-plane responsibilities:

  • Host/SmartNIC control-plane: Manages connections, congestion control, and flow/buffer configuration
  • SmartNIC data-plane: Modular pipeline; each packet traverses fine-grained stages—pre-processing, protocol arithmetic, post-processing, DMA, notification.
  • Parallelism: Datapath modules are replicated for high throughput (e.g., S≈286×S\approx 286\times speedup), and flow-level reordering guarantees in-order semantics.

PnO-TCP (Nan et al., 29 Mar 2025) implements transparent off-load:

  • The entire TCP stack is offloaded to the BlueField DPU without application changes, using host shims and proxying all POSIX socket calls over DMA-batched message rings.
  • DPU-based user-space TCP/IP stack (with DPDK) manages the full state machine, congestion control, retransmission, and delivers significant host CPU savings (40–60%), as well as 34–127% RPS improvements for sub-2KB packets.

A comparative table follows:

System Adaptive/Agentic Offload Model Highlighted Capability
Smart-TCP Agentic LLM+ALU None (runs on host TCP) Adaptive logic, anomaly detection
FlexTOE No (modular FSM) On-path (SmartNIC pipeline) Fine-grained pipeline parallelism, eBPF API
PnO-TCP No (classic FSM) Full off-path (DPU) Transparent stack offload, high RPS gain

4. Experimental Results and Comparative Performance

Smart-TCP (Han et al., 29 Nov 2025):

  • Static field-level prediction:
    • Seq, Ack: 100% (Smart-TCP); Baselines: Seq ≈100%, Ack ≈49%
    • Flags: Smart-TCP 97.50%, Llama3 84.36%, Qwen2.5 88.32%, Gemma 26.36%
    • NewState: Smart-TCP 98.33%, baselines: 85–94%
    • Atomic packet accuracy: Smart-TCP 97.22%, Llama3 42.93%
  • Error detection (balanced test set): overall 94.5%; order error recall 93.0%; flag error recall 96.0%; baselines: 28.5–51.0%
  • End-to-end (30 sessions): Handshake 100%, Data Transfer 100%, Termination 93.33%, Overall 93.33%; baselines fail to sustain stateful data transfer or achieve 0% overall.
  • Interpretation: Decoupling LLM reasoning from arithmetic (ALU) is critical for functional end-to-end TCP protocol realization. Pure LLMs are inadequate for stateful 32-bit arithmetic over sustained sessions.

FlexTOE (Shashidhara et al., 2021):

  • Per-request CPU cycles (Memcached RPCs): FlexTOE ~1.7k (0% host TCP), TAS ~3.3k, Chelsio TOE ~8.9k, Linux TCP ~12.1k
  • 99.99th percentile RPC RTT: FlexTOE is 3.2×\times and 50% lower than Chelsio and TAS, respectively
  • Throughput: FlexTOE achieves up to 5.5×\times Linux, 4.9×\times Chelsio, 1.6×\times TAS
  • BlueField: up to 4×4\times TAS for single-connection RPCs

PnO-TCP (Nan et al., 29 Mar 2025):

  • RPS gain for small packets (<2KB): Redis GET +34%, Lighttpd +127%
  • Host CPU savings: 40–60% depending on application/thread
  • Microbenchmarks: 1.7×\times faster than Linux TCP with <4 host cores
  • p50/p99 latency decreases; maximum latency and jitter can rise due to DMA batching and PCIe variance.

5. Implementation Insights and Design Trade-Offs

Smart-TCP Implementation (Han et al., 29 Nov 2025):

  • Runs at transport layer, replacing standard RFC-9293 state machine logic
  • LLM serving via PyTorch (fine-tuned Llama3-8B/LoRA), ALU implemented in C++/Python with JSON-RPC API
  • Leverages raw sockets, TUN/TAP, or kernel hooks for packet I/O
  • Context aggregator and segment assembler: Python modules

Trade-offs:

  • Computational intensity for LLM inference and ALU invocation may constrain real-time performance
  • Baseline LLMs without ALU separation fail to support sustained protocol integrity, especially for arithmetic-heavy, stateful phases
  • Error detection capabilities are superior to deterministic FSMs or pure LLM baselines, confirming the value of context-perceptive agentic reasoning.

Offload Architectures:

  • FlexTOE modular pipeline design improves data-plane throughput but retains host-side control-plane logic
  • FlexTOE exposes XDP/eBPF plugin points for in-situ protocol/feature extension
  • PnO-TCP maximizes transparency for applications, achieves high host resource savings, but introduces additional DMA/PCIe-induced jitter and limits on DPU compute/memory bandwidth.

6. Outlook and Prospective Developments

A plausible implication is that the agentic paradigm (LLM+ALU) in Smart-TCP enables a qualitatively new regime of adaptability and robust anomaly detection, going beyond what can be realized with static, modular offload schemes. Meanwhile, FlexTOE and PnO-TCP illustrate distinct, practical strategies for tackling the increasing protocol processing demand in data centers and programmable networks—each with associated trade-offs (e.g., off-path vs. on-path, transparency vs. latency). Prospective advances include further optimization of context encoding and model inference speeds in Smart-TCP, integration of agentic elements into offload data planes, and adaptive DMA batching or multi-agent coordination in future hybrid stacks (Han et al., 29 Nov 2025Shashidhara et al., 2021Nan et al., 29 Mar 2025).

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Smart-TCP.