ProxFly: Unikernel TCP & RL Quadcopter Control

Updated 7 February 2026

ProxFly is a dual-system framework combining a unikernel-based, on-the-fly TCP acceleration architecture and a residual RL-enhanced quadcopter control system.
The networking component employs early SYN forwarding and proxy chaining via Miniproxy, achieving up to 37.5% reduced transfer times with markedly lower memory usage.
The aerial robotics module augments a cascaded controller with residual reinforcement learning, significantly reducing position and attitude errors under variable disturbances.

ProxFly encompasses two distinct, high-efficiency systems in network and robotics research: (1) a unikernel-based, on-the-fly TCP acceleration architecture built on Miniproxy and (2) a robust quadcopter control framework leveraging residual reinforcement learning for close-proximity flight. Each system represents state-of-the-art approaches within its domain, offering significant performance improvements, resource efficiency, and rigorous experimental validation (Siracusano et al., 2016, Zhang et al., 2024).

1. Unikernel-Based ProxFly for On-the-Fly TCP Acceleration

Architecture and Cloud OS Integration

ProxFly’s TCP acceleration architecture is implemented using Miniproxy, a Xen unikernel based on MiniOS and a patched lwIP stack. MiniOS provides a paravirtualized, single-address-space kernel with no traditional system calls and achieves boot times on the order of tens of milliseconds. The entire Miniproxy VM is a static ELF image requiring only ~6 MB RAM, supporting massive consolidation. At boot, Miniproxy immediately launches a packet RX/TX loop (via the Xen network backend) and a TCP proxy application managing each proxied connection as a pair of lwIP protocol control blocks (PCBs). A 12-byte custom TCP option enables explicit proxy chaining by embedding the client–server 4-tuple in the SYN, enabling Early SYN Forwarding even for explicit proxies (Siracusano et al., 2016).

TCP Connection Handling Optimizations

ProxFly fundamentally restructures TCP handshake and slow-start timing via the following mechanisms:

Split-TCP and Naïve Proxy Chaining: Without proxies, the 3-way handshake plus first data incurs latency of $4D$ ( $D$ being the one-way delay). Sequential proxy handshakes do not improve this.
Early SYN Forwarding (ESF): On SYN reception, the proxy immediately forwards the SYN, causing the client–proxy and proxy–server handshakes to overlap. For delays $X_1$ and $X_2$ (such that $X_1 + X_2 = D$ ):

$TTFB_{ESF} = 2D + 2\max(X_1, X_2)$

In the balanced configuration ( $X_1 = X_2 = D/2$ ), this yields a $25\%$ TTFB reduction ($3D$). For $N$ evenly spaced in-path proxies:

$TTFB_{ESF} = 2D(1 + \frac{1}{N+1})$

Slow Start Acceleration: Parallelization via proxy splitting reduces effective RTT during window ramp-up. For $k$ slow-start slots and $N$ proxies:

$TTC_{split} = 2D + 2\frac{D}{N+1} + 2k\frac{D}{N+1}$

Mathematical Model: Boot Time, Resources, and Latency Tradeoffs

Letting $t_{boot}$ denote proxy instantiation time, $M$ RAM per instance, and $C$ the CPU cycles per packet, the end-to-end time to first byte with on-the-fly instantiation is:

$TTFB_{fly} = t_{boot} + 2D\left(1 + \frac{1}{N+1}\right)$

The improvement over the baseline (no proxy) is:

$\Delta TTFB = 2D\left(1 - \frac{1}{N+1}\right) - t_{boot}$

For transfer completion after $k$ RTT slots:

$TTC_{fly} = t_{boot} + 2D + 2(k+1)\frac{D}{N+1}$

$\Delta TTC = 2(k+1)D\left(1 - \frac{1}{N+1}\right) - t_{boot}$

A single host with $R$  MB RAM can run $R/M$ Miniproxy instances (e.g., $64$ GB RAM yields $\sim$ 10,000 proxies at $6$ MB each, vs. $64$ Linux proxies at $1$ GB each).

Quantitative Results

In empirical evaluation, Miniproxy achieved 1.534 Gb/s throughput (vs. Varnish’s 1.462 Gb/s), with $\sim$ 5% higher throughput and two orders of magnitude less memory usage. Boot times at $2$–$3$ GHz CPUs are $\sim$ 12 ms for $6$ MB RAM; up to $8$ MB RAM produces $t_{boot} < 60$  ms ( $<230$  ms at $800$ MHz). Median per-flow SYN processing is under $3$ ms for up to 230 concurrent connections. For a 100 ms RTT path and 10 KB flow, adding proxies reduces transfer time from $400$ ms (no proxy) to $300$ ms (1 proxy), $266$ ms (2), or $250$ ms (3) – up to $37.5\%$ improvement. For 25 KB slow-start-dominated transfers, savings rise to $33$– $49\%$ (Siracusano et al., 2016).

Deployment Modes and Operational Guidelines

Principal modes include:

Edge cloud acceleration: Proxies instantiate in under $50$ ms near the client.

Massive isolation:

\sim

10^4

Miniproxy instances per

GB RAM for per-flow/tenant separation.</li> <li><strong>Just-in-time provisioning</strong>: Boot on-demand using orchestration triggered by SYN packet observation; balance placement for maximum latency gain.</li> </ul> <p>Guidelines include memory tuning (6–8 MB per instance for

\sim

10^4

flows), CPU provisioning (

>1

 Gb/s per 3 GHz core), security reinforcement (unikernel surface minimization; authenticate explicit proxy SYN options), rate-limited instantiation to prevent storm effects, and careful path selection to avoid offsetting gains with path-length increases.

2. Residual RL-Based ProxFly for Close Proximity Quadcopter Control

Quadcopter Dynamics and Baseline Control

The control framework models standard Newton–Euler rigid-body dynamics in world-frame position $p\in\mathbb{R}^3$ and body-frame angular velocity $\omega\in\mathbb{R}^3$ :

$m\,\ddot p = m g e_3 + R(\phi,\theta,\psi)\,T + d_{\rm ext},$

$I\,\dot\omega + \omega \times (I\,\omega) = \tau + \tau_{\rm ext}$

with $m$ mass, $R$ rotation matrix, $T$ thrust, $\tau$ torque, and $d_{\rm ext}, \tau_{\rm ext}$ representing disturbances (notably, aerodynamic downwash in close-proximity flight).

The cascaded controller consists of:

Outer loop (position ⟶ thrust/attitude): Given position/velocity errors, applies PD-style law to output normalized thrust $c_{\rm cas}$ and desired attitude $R_{\rm des}$ .
Inner loop (attitude ⟶ body rates): Computes attitude error and outputs compensation body rates $\omega_{\rm cas}$ .
Basic command vector: $u_{\rm basic} = [c_{\rm cas}, \omega_{\rm cas}]^T \in \mathbb{R}^4$ .

Residual RL Module

ProxFly introduces a residual policy on top of the model-based controller, with the following structure:

Observation space ($20$-dim): Includes current error states, last action, and basic controller output.
Action: Residual on thrust and rates, $u_{\rm res} \in \mathbb{R}^4$ , with clipping to practical actuation bounds.
Reward: Weighted sum of position offset, attitude deviation, thrust/rate penalties, and a survival bonus.
Final command: $u_t = u_{\rm basic}(s_t) + u_{\rm res}(o_t)$ .

The actor (policy) and critic (value) networks are both 3-layer MLPs (128 units/layer, LeakyReLU, $\tanh$ output). Training uses Proximal Policy Optimization (PPO) with advantage estimation, clipped surrogate objective, and Adam optimization (Zhang et al., 2024).

Domain Randomization and Robustness Strategy

To enforce robustness and rapid adaptation, every episode randomizes:

Mass and inertia (per-episode, up to $\pm50\%$ variance)
Propeller constants (per-motor)
External vertical/horizontal disturbance profiles (triangular waves; amplitude $0.25$–$4$ N)
Additive torque noise (Gaussian)

This broad parameter sweep forces the residual policy to generalize across identification errors and uncertain/unmodeled turbulence.

Experimental Validation

Simulation: Two-vehicle scenarios with high-fidelity downwash (Karana et al. model) test at separations $0.25$–$0.75$ m, measuring altitude error, attitude RMSE, and control residuals. The RL-added residuals correct for steady-state errors induced by downwash, despite never having seen the Karana model in training.

Real-World Experiments: Using precision motion capture and multi-rate control loops (high-level $50$ Hz, low-level $500$ Hz), three baselines are compared: the basic cascaded controller, a finely tuned model-based downwash compensator (FB-AeroComp), and ProxFly. Metrics include position RMSE $E_{\rm pos}$ and attitude RMSE $E_{\rm att}$ :

Task	Basic	FB-AeroComp	ProxFly
Hovering	0.1199 m, 0.1710 rad	0.1113 m, 0.1818 rad	0.0882 m, 0.0794 rad
Circling (same)	0.1867 m, 0.1976 rad	0.0832 m, 0.1238 rad	0.1385 m, 0.1252 rad
Circling (rev)	0.1451 m, 0.1714 rad	0.0983 m, 0.0930 rad	0.0940 m, 0.0996 rad
Average	0.1506 m, 0.1800 rad	0.0976 m, 0.1329 rad	0.1069 m, 0.1014 rad

ProxFly reduces mean position error (vs. basic) by approximately $29\%$ , and mean attitude error by $44\%$ , while matching or outperforming FB-AeroComp in hover.

In-Air Docking: The system handles rapid load/inertia change and strong turbulence when a small quadcopter drops $5$ cm onto a larger hovering vehicle without loss of stability, the residual thrust response peaks and then returns to nominal.

Discussion, Limitations, and Prospects

By learning only the residual atop a validated controller, ProxFly achieves interpretable, sample-efficient policy refinement with reduced “black box” exposure and avoids the need for inter-vehicle communication. However, high-frequency oscillations in the residual output may stress hardware; extreme parameter mismatch remains a possible failure point; and broader physical generalization awaits further demonstration. Possible extensions include online residual smoothing (system ID), integration of perceptual sensing, and multi-agent explicit intent sharing (Zhang et al., 2024).

3. Broader Significance and Impact

ProxFly, across domains, exemplifies minimalist, high-performance system design:

In networking, it demonstrates that high-frequency, on-the-fly instantiation of TCP-accelerating proxies is practical and efficient, even at per-flow granularity.
In aerial robotics, it establishes a pathway whereby model-based control augmented by RL-trained residuals yields robust, communication-minimal, and hardware-constrained high-agility behaviors previously only achievable via purpose-built, model-tuned controllers.

Both systems explicitly target real-world deployment scenarios: edge-cloud TCP orchestration and safe, reliable close-proximity quadcopter maneuvers, including aggressive actions such as mid-air docking.

4. Trade-offs, Operational Constraints, and Deployment Considerations

Resource provisioning: In ProxFly-TCP, small RAM and fast boot times permit massive scalability and per-tenant isolation; in ProxFly-RL, computation and memory are dictated by actor-critic MLP inference rates and sensor/control latency budgets.
Security: Unikernel architecture in Miniproxy naturally reduces exploit surface; ProxFly quadrotor system can install further safety assurance layers, though the RL residual could in principle produce unexpected time-series artifacts.
Latency/throughput scalability: In networks, orchestration operations must not offset handshake/slow-start gains; in RL flight, the controller must maintain closed-loop frequency under typical wireless/CPU burdens.
Limitations: For both, edge cases (mini-flows in TCP, extreme mass/turbulence in drones) may undercut net benefits, and orchestration or parameter tuning complexity may grow with deployment scale.

5. Summary

ProxFly, as instantiated in both TCP acceleration and quadcopter control, represents minimalist, resource-efficient high-performance solutions, validated in both empirical and theoretical analysis. Key elements are (1) rapid, lightweight proxy instantiation for fine-grained, on-demand TCP acceleration with verified resource and latency savings (Siracusano et al., 2016), and (2) residual RL augmentation of interpretable controllers, yielding robust close-proximity flight with performance matching specialized model-based compensators—without the need for inter-agent communication (Zhang et al., 2024).

For further technical and implementation details, consult the original references and supporting codebase (Siracusano et al., 2016, Zhang et al., 2024).

Markdown Report Issue Upgrade to Chat

References (2)

On-the-Fly TCP Acceleration with Miniproxy (2016)

ProxFly: Robust Control for Close Proximity Quadcopter Flight via Residual Reinforcement Learning (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ProxFly.

ProxFly: Unikernel TCP & RL Quadcopter Control

1. Unikernel-Based ProxFly for On-the-Fly TCP Acceleration

Architecture and Cloud OS Integration

TCP Connection Handling Optimizations

Mathematical Model: Boot Time, Resources, and Latency Tradeoffs

Quantitative Results

Deployment Modes and Operational Guidelines

2. Residual RL-Based ProxFly for Close Proximity Quadcopter Control

Quadcopter Dynamics and Baseline Control

Residual RL Module

Domain Randomization and Robustness Strategy

Experimental Validation

Discussion, Limitations, and Prospects

3. Broader Significance and Impact

4. Trade-offs, Operational Constraints, and Deployment Considerations

5. Summary

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

ProxFly: Unikernel TCP & RL Quadcopter Control

1. Unikernel-Based ProxFly for On-the-Fly TCP Acceleration

Architecture and Cloud OS Integration

TCP Connection Handling Optimizations

Mathematical Model: Boot Time, Resources, and Latency Tradeoffs

Quantitative Results

Deployment Modes and Operational Guidelines

2. Residual RL-Based ProxFly for Close Proximity Quadcopter Control

Quadcopter Dynamics and Baseline Control

Residual RL Module

Domain Randomization and Robustness Strategy

Experimental Validation

Discussion, Limitations, and Prospects

3. Broader Significance and Impact

4. Trade-offs, Operational Constraints, and Deployment Considerations

5. Summary

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research