ReAct: Multi-Domain ML, LLM, and Network Security

Updated 17 January 2026

ReAct is a collection of domain-specific frameworks that enhance ML reliability, LLM decision-making, and network security through minimal yet effective interventions.
It employs post-hoc activation clipping for robust OOD detection and integrates interleaved reasoning and acting in LLMs, leading to improved metrics such as reduced FPR95 and higher task success rates.
In network defense, ReAct uses sliding-window Bloom filters and cross-switch protocols to mitigate AR-DDoS attacks, achieving near-zero legitimate drops under high-throughput scenarios.

ReAct is a term denoting distinct technical frameworks across machine learning, LLM reasoning, and network security contexts—each introduced as an acronym for “Rectified Activations” in OOD detection (Sun et al., 2021), “Reasoning + Acting” in in-context LLM decision pipelines (Yao et al., 2022), and “Reflection Attack Mitigation for Asymmetric Routing” in programmable network defense (Hay et al., 10 Jan 2026). Although nomenclature overlaps, the design rationales and implementations are domain-specific, targeting practical limitations in deep learning reliability, model agent compositionality, and resilient, high-throughput AR-DDoS mitigation.

1. Rectified Activations for OOD Detection

ReAct, as presented in "ReAct: Out-of-distribution Detection With Rectified Activations" (Sun et al., 2021), addresses the challenge of model overconfidence on OOD inputs in deep neural networks. The primary mechanism is a post-hoc clipping of hidden-layer activations at inference, applied to pretrained architectures.

Let $h(x) \in \mathbb{R}^m$ be the feature vector at a chosen hidden layer for input $x$ , and $f(x) = W^\top h(x) + b \in \mathbb{R}^K$ the pre-softmax logits.

The rectification procedure comprises:

Compute $h(x)$ .
For a fixed threshold $c$ , form $\bar{h}(x) = \min(h(x), c\,\mathbf{1})$ (element-wise).
Update logits via $f^{\mathrm{ReAct}}(x) = W^\top\,\bar{h}(x) + b$ .
Apply any OOD score $S(x; f^{\mathrm{ReAct}})$ (MSP, energy score, etc.) and determine OOD status as $S(x; f^{\mathrm{ReAct}}) < \lambda$ , with $\lambda$ set so that a prescribed TPR (e.g., 95%) is obtained for ID data.

Threshold selection is percentile-based using held-out ID validation: $x$ 0 is chosen as the $x$ 1th percentile (typically $x$ 2) of all features in the validation set.

Theoretical analysis models ID activations as rectified Gaussians and OOD as epsilon-skew-normal, showing that expected reduction via clipping increases with variance and skewness—characteristics amplified in OOD inputs. This suppresses OOD logit magnitudes and closes the gap between ID and OOD score distributions.

Empirical results demonstrate marked improvements:

On CIFAR-100 (ResNet-18, energy score): FPR95 decreases from 71.9% to 59.6%, AUROC improves from 82.8% to 87.5%.
On ImageNet-1k (ResNet-50): FPR95 decreases by 46% relative (from 58.4% to 31.4%) and AUROC increases from 86.2% to 93.0%.
On MobileNet-v2: similar robust reductions in FPR95 (by 17.4 pp).

ReAct is compatible across architectures (ResNet, MobileNet), normalization schemes (BatchNorm, GroupNorm, WeightNorm), and OOD scoring functions (MSP, ODIN, Mahalanobis).

2. Interleaved Reasoning and Acting in LLMs

The “ReAct” system described in "ReAct: Synergizing Reasoning and Acting in LLMs" (Yao et al., 2022) proposes a framework for LLMs to jointly generate reasoning traces (chain-of-thought) and concrete action tokens, enabling agents to dynamically interface with external environments and knowledge sources.

At each step $x$ 3, the agent’s context $x$ 4 accumulates observations $x$ 5 and actions $x$ 6. The unified action space $x$ 7 comprises environment acts $x$ 8 (e.g., Search, GoTo) and unbounded natural-language thoughts $x$ 9.

The LLM implements a policy $f(x) = W^\top h(x) + b \in \mathbb{R}^K$ 0, where $f(x) = W^\top h(x) + b \in \mathbb{R}^K$ 1 can be either:

“Thought: …” (reasoning trace), updating context as $f(x) = W^\top h(x) + b \in \mathbb{R}^K$ 2;
“Act: …” (environment action), with new observation $f(x) = W^\top h(x) + b \in \mathbb{R}^K$ 3 and $f(x) = W^\top h(x) + b \in \mathbb{R}^K$ 4.

Representative pseudocode expresses alternation between Thought and Act emission, with context updates reflecting each new line. The agent terminates upon emitting “Finish: $f(x) = W^\top h(x) + b \in \mathbb{R}^K$ 5”.

Benchmarked across QA (HotpotQA, FEVER), interactive games (ALFWorld), and web navigation (WebShop), ReAct outperforms Standard, CoT-only, and Act-only prompting modes:

On ALFWorld: ReAct achieves a 71% success rate vs. 37% (Act-only, BUTLER imitation learning).
On WebShop, ReAct reaches 79% success rate, outperforming IL (60%) and IL+RL (62%).

Further, ReAct reduces hallucinations in multi-hop QA, attaining higher rates of true-positive grounded reasoning traces compared to plain CoT (94% vs. 86%) and lower false-positive rate (6% vs. 14%) in error analysis.

Interpretability is enhanced by complete Thought/Act/Obs trajectories, enabling human inspection and intervention.

Limitations include context window constraints for in-context learning, “looping” reactions under greedy decoding, and manual prompt authoring overheads. Extensions include fine-tuning on annotated ReAct runs, integration with RL and human feedback, and scaling to broader APIs.

3. In-Network AR-DDoS Defense with Asymmetric Routing

In "ReAct: Reflection Attack Mitigation For Asymmetric Routing" (Hay et al., 10 Jan 2026), ReAct denotes a novel in-network system for mitigating Amplification Reflection DDoS (AR-DDoS) attacks on stateless protocols, resolving the challenge of asymmetric request-response paths in modern data-plane networks.

Conventional in-network filters (e.g., Poseidon, Jaqen, DIDA) rely on symmetric routing; requests and responses must traverse the same switch, or legitimate traffic suffers drops under asymmetry.

ReAct deploys three programmable data-plane tables per switch:

Requests: A sliding window of $f(x) = W^\top h(x) + b \in \mathbb{R}^K$ 6 Bloom filters (BFs), each sized $f(x) = W^\top h(x) + b \in \mathbb{R}^K$ 7 bits, with $f(x) = W^\top h(x) + b \in \mathbb{R}^K$ 8 hash functions, persisting all requests over $f(x) = W^\top h(x) + b \in \mathbb{R}^K$ 9 seconds.
Request_Forwarding_Table: Maps client-IP prefixes to downstream switch IDs for request cloning and forwarding.
Forwarded_Requests: Associates transaction IDs with the broadcasting upstream switch(es).

Three correlation flows are distinguished:

Symmetric: requests and responses through the same switch; direct BF insert and test.
Asymmetric: upstream clones requests to downstream switches, which log the request, then, upon seeing responses, reply with forwarding rules to upstream, enabling correct correlation.
Broadcast: When routing changes, broadcast retried requests to all switches, dynamically learning forwarding rules.

Pseudocode for packet handling leverages BF insertions and forwarding actions based on precise header markings and switch IDs. Metadata headers extend packets with a 2-bit mark (null, forward, broadcast, forward_rule) and 16-bit switch_id encoding.

Bloom filter design adheres to round-robin rotation, ensuring queries exclude the currently cleaned filter. The false-positive rate per BF is $h(x)$ 0, with overall misclassification rate $h(x)$ 1.

Automatic adaptation is achieved by exploiting application-level retransmissions—clients that fail to receive a response retry, triggering broadcast and subsequent learning of new forwarding rules, obviating additional control-plane burdens.

Evaluation on Lucid P4, Intel Tofino, and NVIDIA BlueField-3 demonstrates robust AR-DDoS defense:

BlueField-3: zero legitimate response drops for up to 8.07M forged replies/s per core.
Lucid/P4 asymmetric: filtering rate of 97–98% of attack traffic dropped with false negatives $h(x)$ 2– $h(x)$ 3.
Broadcast overhead diminishes (<7%) after bootstrapping.

Comparisons indicate that ReAct eliminates legitimate flow drops seen in CBF baselines under high attack-to-legitimate ratios and achieves near-zero false positives for attacks, independent of system delays.

Lessons highlight the superiority of sliding-window BFs over counting-BF deletion models (mitigating birthday-paradox exploits), and the effectiveness of retransmission-borne bootstrapping for topology dynamics. Limitations include dependence on fixed transaction IDs (DNS, Memcached), programmable switch path coverage, and bootstrapping latency shaped by client timeouts; extensions target generalization to additional AR-DDoS protocols and INT-driven rule management.

4. Algorithmic and Mathematical Structures

Underlying all ReAct frameworks is algorithmic and mathematical formalism tailored to each domain:

Rectified Activations: Clipping operation is strictly elementwise, applied post hoc, with thresholds empirically set and theoretically justified via stochastic models of activation distributions; scoring functions are mathematically explicit (MSP, energy, Mahalanobis).
LLM ReAct: Inference operates as contextual Markovian policy generation, with interleaved updates and context management; sequence-level metrics (EM, accuracy, success rate) are statistically analyzed.
Networking ReAct: Sliding-window BFs provide probabilistic membership guarantees and scalable asymmetry correction; system-level metrics (attack filtering rate, broadcast overhead) link operational parameters ( $h(x)$ 4, $h(x)$ 5, $h(x)$ 6, $h(x)$ 7) to observed security outcomes.

5. Practical Applications and Domain Impact

OOD Detection: ReAct offers minimal architectural disruption for robust classification, increasing deployment safety in vision models and broader ML systems.
LLM Reasoning+Acting: ReAct unifies agent control for both decision making (WebShop, ALFWorld) and multi-hop QA, with interpretability and error-tracing conducive to practical interactive agents.
Network Security: ReAct directly addresses the AR-DDoS amplification threat with dynamic adaptation, near-zero legitimate drop rates under network churn and attack, suitable for multi-switch, high-throughput settings.

6. Comparative Merits and Limitations

In each technical context, ReAct advances the state of the art by overcoming major practical and theoretical obstacles:

In OOD detection, clipping corrects unchecked overconfidence with competitive empirical and theoretical guarantees.
In LLM agent design, combinatorial Thought+Act duality enhances both performance and transparency, though at the cost of prompt engineering and some inefficient looping behaviors.
In network AR-DDoS defense, ReAct’s Bloom-filter-based, self-adaptive cross-switch protocol sustains legitimate flow integrity under dynamic, asymmetric routing, a capability unattainable by prior symmetric-only solutions.

Limitations remain: calibration of thresholds in OOD, context-length and iterative prompt construction in LLMs, transaction-ID constraints and bootstrapping latency in networks. Future work targets more general feature regularization, scalable multi-task LLM agent integration, and protocol-agnostic network defense.

7. Technical Summary Table

Domain	Fundamental Mechanism	Principal Outcome
OOD Detection	Hidden activation clipping	Lowered FPR95, improved AUROC
LLM Reason+Act	Interleaved reasoning/action	Higher task success/interpretable traces
AR-DDoS Defense	Bloom filter cross-correlation	Near-zero legitimate drops/adaptive routing

Each entry reflects direct claims and data from (Sun et al., 2021, Yao et al., 2022, Hay et al., 10 Jan 2026).

8. Research Significance

Across independent technical spheres, “ReAct” denotes concise, modular strategies for enhancing reliability, controllability, and robustness—via minimalist architectural interventions in machine learning, interpretable agent workflow in LLMs, and programmable network security in high-throughput, dynamically routed infrastructures. The frameworks are characterized by plug-and-play compatibility, clear performance thresholds, and data-driven self-adaptation, collectively setting a foundation for safe, tractable, and accountable deployment of ML and networking systems.

Markdown Report Issue Upgrade to Chat

References (3)

ReAct: Out-of-distribution Detection With Rectified Activations (2021)

ReAct: Synergizing Reasoning and Acting in Language Models (2022)

ReAct: Reflection Attack Mitigation For Asymmetric Routing (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ReAct.