Papers
Topics
Authors
Recent
Search
2000 character limit reached

ProxFly: Unikernel TCP & RL Quadcopter Control

Updated 7 February 2026
  • ProxFly is a dual-system framework combining a unikernel-based, on-the-fly TCP acceleration architecture and a residual RL-enhanced quadcopter control system.
  • The networking component employs early SYN forwarding and proxy chaining via Miniproxy, achieving up to 37.5% reduced transfer times with markedly lower memory usage.
  • The aerial robotics module augments a cascaded controller with residual reinforcement learning, significantly reducing position and attitude errors under variable disturbances.

ProxFly encompasses two distinct, high-efficiency systems in network and robotics research: (1) a unikernel-based, on-the-fly TCP acceleration architecture built on Miniproxy and (2) a robust quadcopter control framework leveraging residual reinforcement learning for close-proximity flight. Each system represents state-of-the-art approaches within its domain, offering significant performance improvements, resource efficiency, and rigorous experimental validation (Siracusano et al., 2016, Zhang et al., 2024).

1. Unikernel-Based ProxFly for On-the-Fly TCP Acceleration

Architecture and Cloud OS Integration

ProxFly’s TCP acceleration architecture is implemented using Miniproxy, a Xen unikernel based on MiniOS and a patched lwIP stack. MiniOS provides a paravirtualized, single-address-space kernel with no traditional system calls and achieves boot times on the order of tens of milliseconds. The entire Miniproxy VM is a static ELF image requiring only ~6 MB RAM, supporting massive consolidation. At boot, Miniproxy immediately launches a packet RX/TX loop (via the Xen network backend) and a TCP proxy application managing each proxied connection as a pair of lwIP protocol control blocks (PCBs). A 12-byte custom TCP option enables explicit proxy chaining by embedding the client–server 4-tuple in the SYN, enabling Early SYN Forwarding even for explicit proxies (Siracusano et al., 2016).

TCP Connection Handling Optimizations

ProxFly fundamentally restructures TCP handshake and slow-start timing via the following mechanisms:

  • Split-TCP and Naïve Proxy Chaining: Without proxies, the 3-way handshake plus first data incurs latency of $4D$ (DD being the one-way delay). Sequential proxy handshakes do not improve this.
  • Early SYN Forwarding (ESF): On SYN reception, the proxy immediately forwards the SYN, causing the client–proxy and proxy–server handshakes to overlap. For delays X1X_1 and X2X_2 (such that X1+X2=DX_1 + X_2 = D):

TTFBESF=2D+2max(X1,X2)TTFB_{ESF} = 2D + 2\max(X_1, X_2)

In the balanced configuration (X1=X2=D/2X_1 = X_2 = D/2), this yields a 25%25\% TTFB reduction ($3D$). For NN evenly spaced in-path proxies:

TTFBESF=2D(1+1N+1)TTFB_{ESF} = 2D(1 + \frac{1}{N+1})

  • Slow Start Acceleration: Parallelization via proxy splitting reduces effective RTT during window ramp-up. For kk slow-start slots and NN proxies:

TTCsplit=2D+2DN+1+2kDN+1TTC_{split} = 2D + 2\frac{D}{N+1} + 2k\frac{D}{N+1}

Mathematical Model: Boot Time, Resources, and Latency Tradeoffs

Letting tboott_{boot} denote proxy instantiation time, MM RAM per instance, and CC the CPU cycles per packet, the end-to-end time to first byte with on-the-fly instantiation is:

TTFBfly=tboot+2D(1+1N+1)TTFB_{fly} = t_{boot} + 2D\left(1 + \frac{1}{N+1}\right)

The improvement over the baseline (no proxy) is:

ΔTTFB=2D(11N+1)tboot\Delta TTFB = 2D\left(1 - \frac{1}{N+1}\right) - t_{boot}

For transfer completion after kk RTT slots:

TTCfly=tboot+2D+2(k+1)DN+1TTC_{fly} = t_{boot} + 2D + 2(k+1)\frac{D}{N+1}

ΔTTC=2(k+1)D(11N+1)tboot\Delta TTC = 2(k+1)D\left(1 - \frac{1}{N+1}\right) - t_{boot}

A single host with RR MB RAM can run R/MR/M Miniproxy instances (e.g., $64$ GB RAM yields \sim10,000 proxies at $6$ MB each, vs. $64$ Linux proxies at $1$ GB each).

Quantitative Results

In empirical evaluation, Miniproxy achieved 1.534 Gb/s throughput (vs. Varnish’s 1.462 Gb/s), with \sim5% higher throughput and two orders of magnitude less memory usage. Boot times at $2$–$3$ GHz CPUs are \sim12 ms for $6$ MB RAM; up to $8$ MB RAM produces tboot<60t_{boot} < 60 ms (<230<230 ms at $800$ MHz). Median per-flow SYN processing is under $3$ ms for up to 230 concurrent connections. For a 100 ms RTT path and 10 KB flow, adding proxies reduces transfer time from $400$ ms (no proxy) to $300$ ms (1 proxy), $266$ ms (2), or $250$ ms (3) – up to 37.5%37.5\% improvement. For 25 KB slow-start-dominated transfers, savings rise to $33$–49%49\% (Siracusano et al., 2016).

Deployment Modes and Operational Guidelines

Principal modes include:

  • Edge cloud acceleration: Proxies instantiate in under $50$ ms near the client.
  • Massive isolation: \sim10^4Miniproxyinstancesper Miniproxy instances per 64GBRAMforperflow/tenantseparation.</li><li><strong>Justintimeprovisioning</strong>:BootondemandusingorchestrationtriggeredbySYNpacketobservation;balanceplacementformaximumlatencygain.</li></ul><p>Guidelinesincludememorytuning(68MBperinstancefor GB RAM for per-flow/tenant separation.</li> <li><strong>Just-in-time provisioning</strong>: Boot on-demand using orchestration triggered by SYN packet observation; balance placement for maximum latency gain.</li> </ul> <p>Guidelines include memory tuning (6–8 MB per instance for \sim10410^4 flows), CPU provisioning (>1>1 Gb/s per 3 GHz core), security reinforcement (unikernel surface minimization; authenticate explicit proxy SYN options), rate-limited instantiation to prevent storm effects, and careful path selection to avoid offsetting gains with path-length increases.

    2. Residual RL-Based ProxFly for Close Proximity Quadcopter Control

    Quadcopter Dynamics and Baseline Control

    The control framework models standard Newton–Euler rigid-body dynamics in world-frame position pR3p\in\mathbb{R}^3 and body-frame angular velocity ωR3\omega\in\mathbb{R}^3:

    mp¨=mge3+R(ϕ,θ,ψ)T+dext,m\,\ddot p = m g e_3 + R(\phi,\theta,\psi)\,T + d_{\rm ext},

    Iω˙+ω×(Iω)=τ+τextI\,\dot\omega + \omega \times (I\,\omega) = \tau + \tau_{\rm ext}

    with mm mass, RR rotation matrix, TT thrust, τ\tau torque, and dext,τextd_{\rm ext}, \tau_{\rm ext} representing disturbances (notably, aerodynamic downwash in close-proximity flight).

    The cascaded controller consists of:

    • Outer loop (position ⟶ thrust/attitude): Given position/velocity errors, applies PD-style law to output normalized thrust ccasc_{\rm cas} and desired attitude RdesR_{\rm des}.
    • Inner loop (attitude ⟶ body rates): Computes attitude error and outputs compensation body rates ωcas\omega_{\rm cas}.
    • Basic command vector: ubasic=[ccas,ωcas]TR4u_{\rm basic} = [c_{\rm cas}, \omega_{\rm cas}]^T \in \mathbb{R}^4.

    Residual RL Module

    ProxFly introduces a residual policy on top of the model-based controller, with the following structure:

    • Observation space ($20$-dim): Includes current error states, last action, and basic controller output.
    • Action: Residual on thrust and rates, uresR4u_{\rm res} \in \mathbb{R}^4, with clipping to practical actuation bounds.
    • Reward: Weighted sum of position offset, attitude deviation, thrust/rate penalties, and a survival bonus.
    • Final command: ut=ubasic(st)+ures(ot)u_t = u_{\rm basic}(s_t) + u_{\rm res}(o_t).

    The actor (policy) and critic (value) networks are both 3-layer MLPs (128 units/layer, LeakyReLU, tanh\tanh output). Training uses Proximal Policy Optimization (PPO) with advantage estimation, clipped surrogate objective, and Adam optimization (Zhang et al., 2024).

    Domain Randomization and Robustness Strategy

    To enforce robustness and rapid adaptation, every episode randomizes:

    • Mass and inertia (per-episode, up to ±50%\pm50\% variance)
    • Propeller constants (per-motor)
    • External vertical/horizontal disturbance profiles (triangular waves; amplitude $0.25$–$4$ N)
    • Additive torque noise (Gaussian)

    This broad parameter sweep forces the residual policy to generalize across identification errors and uncertain/unmodeled turbulence.

    Experimental Validation

    Simulation: Two-vehicle scenarios with high-fidelity downwash (Karana et al. model) test at separations $0.25$–$0.75$ m, measuring altitude error, attitude RMSE, and control residuals. The RL-added residuals correct for steady-state errors induced by downwash, despite never having seen the Karana model in training.

    Real-World Experiments: Using precision motion capture and multi-rate control loops (high-level $50$ Hz, low-level $500$ Hz), three baselines are compared: the basic cascaded controller, a finely tuned model-based downwash compensator (FB-AeroComp), and ProxFly. Metrics include position RMSE EposE_{\rm pos} and attitude RMSE EattE_{\rm att}:

    Task Basic FB-AeroComp ProxFly
    Hovering 0.1199 m, 0.1710 rad 0.1113 m, 0.1818 rad 0.0882 m, 0.0794 rad
    Circling (same) 0.1867 m, 0.1976 rad 0.0832 m, 0.1238 rad 0.1385 m, 0.1252 rad
    Circling (rev) 0.1451 m, 0.1714 rad 0.0983 m, 0.0930 rad 0.0940 m, 0.0996 rad
    Average 0.1506 m, 0.1800 rad 0.0976 m, 0.1329 rad 0.1069 m, 0.1014 rad

    ProxFly reduces mean position error (vs. basic) by approximately 29%29\%, and mean attitude error by 44%44\%, while matching or outperforming FB-AeroComp in hover.

    In-Air Docking: The system handles rapid load/inertia change and strong turbulence when a small quadcopter drops $5$ cm onto a larger hovering vehicle without loss of stability, the residual thrust response peaks and then returns to nominal.

    Discussion, Limitations, and Prospects

    By learning only the residual atop a validated controller, ProxFly achieves interpretable, sample-efficient policy refinement with reduced “black box” exposure and avoids the need for inter-vehicle communication. However, high-frequency oscillations in the residual output may stress hardware; extreme parameter mismatch remains a possible failure point; and broader physical generalization awaits further demonstration. Possible extensions include online residual smoothing (system ID), integration of perceptual sensing, and multi-agent explicit intent sharing (Zhang et al., 2024).

    3. Broader Significance and Impact

    ProxFly, across domains, exemplifies minimalist, high-performance system design:

    • In networking, it demonstrates that high-frequency, on-the-fly instantiation of TCP-accelerating proxies is practical and efficient, even at per-flow granularity.
    • In aerial robotics, it establishes a pathway whereby model-based control augmented by RL-trained residuals yields robust, communication-minimal, and hardware-constrained high-agility behaviors previously only achievable via purpose-built, model-tuned controllers.

    Both systems explicitly target real-world deployment scenarios: edge-cloud TCP orchestration and safe, reliable close-proximity quadcopter maneuvers, including aggressive actions such as mid-air docking.

    4. Trade-offs, Operational Constraints, and Deployment Considerations

    • Resource provisioning: In ProxFly-TCP, small RAM and fast boot times permit massive scalability and per-tenant isolation; in ProxFly-RL, computation and memory are dictated by actor-critic MLP inference rates and sensor/control latency budgets.
    • Security: Unikernel architecture in Miniproxy naturally reduces exploit surface; ProxFly quadrotor system can install further safety assurance layers, though the RL residual could in principle produce unexpected time-series artifacts.
    • Latency/throughput scalability: In networks, orchestration operations must not offset handshake/slow-start gains; in RL flight, the controller must maintain closed-loop frequency under typical wireless/CPU burdens.
    • Limitations: For both, edge cases (mini-flows in TCP, extreme mass/turbulence in drones) may undercut net benefits, and orchestration or parameter tuning complexity may grow with deployment scale.

    5. Summary

    ProxFly, as instantiated in both TCP acceleration and quadcopter control, represents minimalist, resource-efficient high-performance solutions, validated in both empirical and theoretical analysis. Key elements are (1) rapid, lightweight proxy instantiation for fine-grained, on-demand TCP acceleration with verified resource and latency savings (Siracusano et al., 2016), and (2) residual RL augmentation of interpretable controllers, yielding robust close-proximity flight with performance matching specialized model-based compensators—without the need for inter-agent communication (Zhang et al., 2024).

    For further technical and implementation details, consult the original references and supporting codebase (Siracusano et al., 2016, Zhang et al., 2024).

    Definition Search Book Streamline Icon: https://streamlinehq.com
    References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ProxFly.