ProxFly: Unikernel TCP & RL Quadcopter Control
- ProxFly is a dual-system framework combining a unikernel-based, on-the-fly TCP acceleration architecture and a residual RL-enhanced quadcopter control system.
- The networking component employs early SYN forwarding and proxy chaining via Miniproxy, achieving up to 37.5% reduced transfer times with markedly lower memory usage.
- The aerial robotics module augments a cascaded controller with residual reinforcement learning, significantly reducing position and attitude errors under variable disturbances.
ProxFly encompasses two distinct, high-efficiency systems in network and robotics research: (1) a unikernel-based, on-the-fly TCP acceleration architecture built on Miniproxy and (2) a robust quadcopter control framework leveraging residual reinforcement learning for close-proximity flight. Each system represents state-of-the-art approaches within its domain, offering significant performance improvements, resource efficiency, and rigorous experimental validation (Siracusano et al., 2016, Zhang et al., 2024).
1. Unikernel-Based ProxFly for On-the-Fly TCP Acceleration
Architecture and Cloud OS Integration
ProxFly’s TCP acceleration architecture is implemented using Miniproxy, a Xen unikernel based on MiniOS and a patched lwIP stack. MiniOS provides a paravirtualized, single-address-space kernel with no traditional system calls and achieves boot times on the order of tens of milliseconds. The entire Miniproxy VM is a static ELF image requiring only ~6 MB RAM, supporting massive consolidation. At boot, Miniproxy immediately launches a packet RX/TX loop (via the Xen network backend) and a TCP proxy application managing each proxied connection as a pair of lwIP protocol control blocks (PCBs). A 12-byte custom TCP option enables explicit proxy chaining by embedding the client–server 4-tuple in the SYN, enabling Early SYN Forwarding even for explicit proxies (Siracusano et al., 2016).
TCP Connection Handling Optimizations
ProxFly fundamentally restructures TCP handshake and slow-start timing via the following mechanisms:
- Split-TCP and Naïve Proxy Chaining: Without proxies, the 3-way handshake plus first data incurs latency of $4D$ ( being the one-way delay). Sequential proxy handshakes do not improve this.
- Early SYN Forwarding (ESF): On SYN reception, the proxy immediately forwards the SYN, causing the client–proxy and proxy–server handshakes to overlap. For delays and (such that ):
In the balanced configuration (), this yields a TTFB reduction ($3D$). For evenly spaced in-path proxies:
- Slow Start Acceleration: Parallelization via proxy splitting reduces effective RTT during window ramp-up. For slow-start slots and proxies:
Mathematical Model: Boot Time, Resources, and Latency Tradeoffs
Letting denote proxy instantiation time, RAM per instance, and the CPU cycles per packet, the end-to-end time to first byte with on-the-fly instantiation is:
The improvement over the baseline (no proxy) is:
For transfer completion after RTT slots:
A single host with MB RAM can run Miniproxy instances (e.g., $64$ GB RAM yields 10,000 proxies at $6$ MB each, vs. $64$ Linux proxies at $1$ GB each).
Quantitative Results
In empirical evaluation, Miniproxy achieved 1.534 Gb/s throughput (vs. Varnish’s 1.462 Gb/s), with 5% higher throughput and two orders of magnitude less memory usage. Boot times at $2$–$3$ GHz CPUs are 12 ms for $6$ MB RAM; up to $8$ MB RAM produces ms ( ms at $800$ MHz). Median per-flow SYN processing is under $3$ ms for up to 230 concurrent connections. For a 100 ms RTT path and 10 KB flow, adding proxies reduces transfer time from $400$ ms (no proxy) to $300$ ms (1 proxy), $266$ ms (2), or $250$ ms (3) – up to improvement. For 25 KB slow-start-dominated transfers, savings rise to $33$– (Siracusano et al., 2016).
Deployment Modes and Operational Guidelines
Principal modes include:
- Edge cloud acceleration: Proxies instantiate in under $50$ ms near the client.
- Massive isolation: 10^464\sim flows), CPU provisioning ( Gb/s per 3 GHz core), security reinforcement (unikernel surface minimization; authenticate explicit proxy SYN options), rate-limited instantiation to prevent storm effects, and careful path selection to avoid offsetting gains with path-length increases.
2. Residual RL-Based ProxFly for Close Proximity Quadcopter Control
Quadcopter Dynamics and Baseline Control
The control framework models standard Newton–Euler rigid-body dynamics in world-frame position and body-frame angular velocity :
with mass, rotation matrix, thrust, torque, and representing disturbances (notably, aerodynamic downwash in close-proximity flight).
The cascaded controller consists of:
- Outer loop (position ⟶ thrust/attitude): Given position/velocity errors, applies PD-style law to output normalized thrust and desired attitude .
- Inner loop (attitude ⟶ body rates): Computes attitude error and outputs compensation body rates .
- Basic command vector: .
Residual RL Module
ProxFly introduces a residual policy on top of the model-based controller, with the following structure:
- Observation space ($20$-dim): Includes current error states, last action, and basic controller output.
- Action: Residual on thrust and rates, , with clipping to practical actuation bounds.
- Reward: Weighted sum of position offset, attitude deviation, thrust/rate penalties, and a survival bonus.
- Final command: .
The actor (policy) and critic (value) networks are both 3-layer MLPs (128 units/layer, LeakyReLU, output). Training uses Proximal Policy Optimization (PPO) with advantage estimation, clipped surrogate objective, and Adam optimization (Zhang et al., 2024).
Domain Randomization and Robustness Strategy
To enforce robustness and rapid adaptation, every episode randomizes:
- Mass and inertia (per-episode, up to variance)
- Propeller constants (per-motor)
- External vertical/horizontal disturbance profiles (triangular waves; amplitude $0.25$–$4$ N)
- Additive torque noise (Gaussian)
This broad parameter sweep forces the residual policy to generalize across identification errors and uncertain/unmodeled turbulence.
Experimental Validation
Simulation: Two-vehicle scenarios with high-fidelity downwash (Karana et al. model) test at separations $0.25$–$0.75$ m, measuring altitude error, attitude RMSE, and control residuals. The RL-added residuals correct for steady-state errors induced by downwash, despite never having seen the Karana model in training.
Real-World Experiments: Using precision motion capture and multi-rate control loops (high-level $50$ Hz, low-level $500$ Hz), three baselines are compared: the basic cascaded controller, a finely tuned model-based downwash compensator (FB-AeroComp), and ProxFly. Metrics include position RMSE and attitude RMSE :
Task Basic FB-AeroComp ProxFly Hovering 0.1199 m, 0.1710 rad 0.1113 m, 0.1818 rad 0.0882 m, 0.0794 rad Circling (same) 0.1867 m, 0.1976 rad 0.0832 m, 0.1238 rad 0.1385 m, 0.1252 rad Circling (rev) 0.1451 m, 0.1714 rad 0.0983 m, 0.0930 rad 0.0940 m, 0.0996 rad Average 0.1506 m, 0.1800 rad 0.0976 m, 0.1329 rad 0.1069 m, 0.1014 rad ProxFly reduces mean position error (vs. basic) by approximately , and mean attitude error by , while matching or outperforming FB-AeroComp in hover.
In-Air Docking: The system handles rapid load/inertia change and strong turbulence when a small quadcopter drops $5$ cm onto a larger hovering vehicle without loss of stability, the residual thrust response peaks and then returns to nominal.
Discussion, Limitations, and Prospects
By learning only the residual atop a validated controller, ProxFly achieves interpretable, sample-efficient policy refinement with reduced “black box” exposure and avoids the need for inter-vehicle communication. However, high-frequency oscillations in the residual output may stress hardware; extreme parameter mismatch remains a possible failure point; and broader physical generalization awaits further demonstration. Possible extensions include online residual smoothing (system ID), integration of perceptual sensing, and multi-agent explicit intent sharing (Zhang et al., 2024).
3. Broader Significance and Impact
ProxFly, across domains, exemplifies minimalist, high-performance system design:
- In networking, it demonstrates that high-frequency, on-the-fly instantiation of TCP-accelerating proxies is practical and efficient, even at per-flow granularity.
- In aerial robotics, it establishes a pathway whereby model-based control augmented by RL-trained residuals yields robust, communication-minimal, and hardware-constrained high-agility behaviors previously only achievable via purpose-built, model-tuned controllers.
Both systems explicitly target real-world deployment scenarios: edge-cloud TCP orchestration and safe, reliable close-proximity quadcopter maneuvers, including aggressive actions such as mid-air docking.
4. Trade-offs, Operational Constraints, and Deployment Considerations
- Resource provisioning: In ProxFly-TCP, small RAM and fast boot times permit massive scalability and per-tenant isolation; in ProxFly-RL, computation and memory are dictated by actor-critic MLP inference rates and sensor/control latency budgets.
- Security: Unikernel architecture in Miniproxy naturally reduces exploit surface; ProxFly quadrotor system can install further safety assurance layers, though the RL residual could in principle produce unexpected time-series artifacts.
- Latency/throughput scalability: In networks, orchestration operations must not offset handshake/slow-start gains; in RL flight, the controller must maintain closed-loop frequency under typical wireless/CPU burdens.
- Limitations: For both, edge cases (mini-flows in TCP, extreme mass/turbulence in drones) may undercut net benefits, and orchestration or parameter tuning complexity may grow with deployment scale.
5. Summary
ProxFly, as instantiated in both TCP acceleration and quadcopter control, represents minimalist, resource-efficient high-performance solutions, validated in both empirical and theoretical analysis. Key elements are (1) rapid, lightweight proxy instantiation for fine-grained, on-demand TCP acceleration with verified resource and latency savings (Siracusano et al., 2016), and (2) residual RL augmentation of interpretable controllers, yielding robust close-proximity flight with performance matching specialized model-based compensators—without the need for inter-agent communication (Zhang et al., 2024).
For further technical and implementation details, consult the original references and supporting codebase (Siracusano et al., 2016, Zhang et al., 2024).
References (2)