Papers
Topics
Authors
Recent
Search
2000 character limit reached

Wan-Move: Control in Video, Networks & Cloud

Updated 20 February 2026
  • Wan-Move is a comprehensive framework that integrates advanced methods for motion-controllable video generation, WAN traffic offloading, and cloud VM migration.
  • The video generation approach leverages dense point trajectories and latent feature editing to deliver high fidelity and precise motion control.
  • Each framework within Wan-Move employs dynamic optimization and predictive routing to reduce costs, improve performance, and enhance scalability.

Wan-Move encompasses multiple advanced methods for improving control and efficiency in disparate technological domains, including motion-controllable video generation, WAN traffic offloading in conferencing services, and performance-enhanced wide-area virtual machine migration. The term denotes distinct systems and frameworks introduced in recent academic works, each tackling the challenges of control, resource efficiency, and scalability across video synthesis, real-time communications, and distributed cloud computing.

1. Motion-Controllable Video Generation via Wan-Move

Wan-Move (Chu et al., 9 Dec 2025) is a scalable framework for precise, high-quality motion control in video synthesis. In contrast to existing techniques that employ coarse control primitives (bounding boxes, sparse masks) and rely on auxiliary encoders with limited fine-tuning scalability, Wan-Move directly edits the original image-to-video (I2V) condition features to imbue them with motion-awareness.

Central to the method is the representation of motion using dense point trajectories pR(1+T)×2p \in \mathbb{R}^{(1+T)\times2}, densely sampled across the source frame (for example, a 32×3232\times32 grid yields up to 1024 tracks). These tracks are projected from image to latent VAE space via downsampling factors ftf_t (temporal) and fsf_s (spatial), yielding

p~[0]=p[0]/fs,p~[n]=1ftfsi=(n1)ft+1nftp[i],\tilde{p}[0] = p[0]/f_s,\qquad \tilde{p}[n]=\frac{1}{f_t f_s}\sum_{i=(n-1)f_t+1}^{n f_t}p[i],

for n=1...T/ftn=1...T/f_t. Given a standard VAE encoding zimagez_{\text{image}} of the first frame, motion is “painted” into the latent conditional by copying the first-frame feature along each trajectory,

zimage[n,hn,wn,:]zimage[0,h0,w0,:]z_\text{image}[n, h_n, w_n, :] \leftarrow z_\text{image}[0, h_0, w_0, :]

with (h0,w0)=p~[0](h_0,w_0)=\lfloor \tilde{p}[0] \rfloor, (hn,wn)=p~[n](h_n,w_n)=\lfloor \tilde{p}[n] \rfloor. If multiple tracks map to the same cell, a selection is made at random to preserve sharpness.

The resultant spatiotemporal feature map is fed into an off-the-shelf I2V diffusion backbone, such as Wan-I2V-14B, via simple channel concatenation. Only the backbone is fine-tuned, using a standard flow-matching loss—no auxiliary motion encoder or architecture modification is required.

The empirical pipeline operates as follows:

  1. Randomly sample kk point trajectories (with k=0k=0 in 5% of cases to preserve vanilla I2V predictions, otherwise kUniform[1,200]k\sim\mathrm{Uniform}[1,200]), tracked with CoTracker.
  2. Construct the updated latent map to reflect motion via feature replication.
  3. Pass the concatenated input through the frozen VAE encoder and diffusion backbone with decoupled cross-attention for text and CLIP image context.
  4. Optimize with the flow-matching loss

LFM(θ)=Et,xt,c[vθ(xt,t,c)vt(xt)2],c={zimage,zglobal,ztext},L_{\mathrm{FM}}(\theta) = \mathbb{E}_{t, x_t, c} \left[ \| v_\theta(x_t, t, c) - v_t(x_t) \|^2 \right],\quad c=\{z_\text{image}', z_\text{global}, z_\text{text}\},

where zimagez_\text{image}' encodes the motion guidance.

At inference, classifier-free guidance is performed with v~=vuncond+w(vcondvuncond)ṽ = v_\text{uncond} + w(v_\text{cond}-v_\text{uncond}), with w5.0w\approx5.0.

Results on MoveBench—a curated benchmark of 1,018 videos with dense human+SAM annotations and 54 categories—demonstrate that Wan-Move achieves leading motion control: FID=12.2, FVD=83.5, EPE=2.6 px, surpassing Tora, MagicMotion, and even commercial solutions such as Kling 1.5 Pro’s Motion Brush in both automatic and human evaluations. Ablations establish that latent feature replication outperforms naive pixel copying, and that increasing trajectory density enhances motion fidelity (EPE down to $1.1$ px at 1024 tracks).

Wan-Move also generalizes over backbone and data scale, supporting applications such as multi-object dragging, camera and 3D rotations, and cross-scene transfer through variable trajectory input.

2. WAN Traffic Offloading in Real-Time Conferencing: The Wan-Move System

The Wan-Move paradigm (Kataria et al., 2024) in wide-area networking refers to a framework for strategically offloading a fraction of real-time conferencing traffic from dedicated WANs onto the public Internet. The primary goal is to curtail network costs while meeting strict Quality-of-Service (QoS) and quality-of-experience (QoE) requirements.

The system evolves through several core phases:

a) Large-Scale Latency Measurement:

A campaign collects 3.5 million one-way latency measurements per day, spanning 241,000 source cities and 21 conferencing DCs, enabling region-specific comparisons between WAN and Internet performance. Results reveal that, for over 60% of North America/Europe hours, the Internet’s latency is within 10 ms of WAN (sometimes outperforming), and loss rates converge for most intervals.

b) Titan: Production Internet-Offload Controller:

A controller “moves” calls to the Internet based on predictive demand forecasting (Holt–Winters smoothing), respecting per-configuration offload budgets. The system ensures calls are rerouted only if latency and packet loss remain below thresholds. Operational deployment across select regions confirmed:

  • P90 latency inflation ≤10 ms, packet loss increase <0.01%
  • User-reported QoE unchanged
  • Up to 20% reduction in peak WAN bandwidth

c) Titan-Next: Co-Optimal Routing and Server Assignment:

The research prototype formulates server selection/routing as an integer program over:

minx,UeEUe\min_{x,U} \sum_{e\in E} U_e subject to:

  • s,rxc,s,r=1\sum_{s,r} x_{c,s,r} = 1 (completeness),
  • c,rdcxc,s,rcaps\sum_{c,r} d_c x_{c,s,r} \leq \text{cap}_s (server capacity),
  • c,sAs,edcxc,s,WANUe\sum_{c,s} A_{s,e} d_c x_{c,s,WAN} \leq U_e (WAN link peak enforcement),
  • xc,s,r{0,1}x_{c,s,r}\in\{0,1\}.

Joint optimization cuts WAN peak load by up to 61% vs all-WAN (median 45%), and 30–40% over static region split. QoS trade-off remains tightly bounded: median one-way latency increase +5 ms, with no statistically significant MOS degradation.

d) Operational Guidelines:

The phased deployment strategy recommends pilot measurement/calibration, “dark launch,” and dynamic fail-open/failback logic when path performance degrades. Internet egress is typically 3–5× cheaper than WAN; dynamic thresholds (e.g., 20 ms one-way) and health checks are critical for safe operation.

3. WAN-Move for Energy-Efficient Wide-Area Cloud VM Migration

The WAN-Move technique proposed by Kuribayashi (Kuribayashi, 2013) addresses QoS and power efficiency in wide-area live migration of cloud VMs. When a VM is migrated over large distances, end-to-end RTT increases and throughput may fall, threatening both application performance and energy efficiency.

a) System Overview:

Upon detection of potentially suboptimal post-migration RTT (e.g., exceeding 100 ms) or bandwidth drops, the orchestrator launches WAN accelerator instances at both the target DC and client edge. These accelerators terminate and regenerate TCP connections, leveraging techniques including window scaling, ACK-proxying, compression, and caching.

b) mSCTP-Based Data Handoff:

Owing to the requirement of a new TCP handshake for each WAN-optimized path, the controller orchestrates IP handover using mSCTP’s dynamic address features (RFC 5061), maintaining association during migration:

  1. Pre-migration: mSCTP association IP1_1↔IP3_3, with normal WAN acceleration (if present).
  2. VM migration: “Teleport” to Center 2; ASCONF advertises new VM IP.
  3. Post-migration: TCP connection is established to new IP via the WAN accelerator; old path is retired after data confirmation.

c) Analytical Model and Empirical Results:

Total energy consumption is

Etotal(R,D)=[Plink(R)+Pfixed]T(R,D),T(R,D)=SR+κDE_\text{total}(R,D) = [P_\text{link}(R) + P_\text{fixed}] \cdot T(R,D),\quad T(R,D)=\frac{S}{R}+\kappa D

Empirically, energy reductions of up to a factor of 10 were observed for 500 ms one-way delays when using acceleration (FTP transfer time and energy both cut to \approx10% baseline). Insertion/teardown overheads were consistently under 5% of migration time.

Delay (one-way) [ms] FTP Time, T(d)T(d) (normalized)
0 0.25
50 0.10
100 0.08
200 0.07

d) Deployment Recommendations:

Threshold-driven activation, on-demand accelerator leasing, integrated throughput/power feedback, and optimal accelerator placement are recommended. Acceleration is paired with data compression, deduplication, and TCP scaling to maximize link utilization and minimize energy.

4. Comparative Summary of Wan-Move Approaches

Domain Core Mechanism Key Metrics/Findings
Video Generation (Chu et al., 9 Dec 2025) Latent trajectory editing FID=12.2, EPE=2.6 px, no new parameters, surpasses commercial SOTA
Conferencing Traffic (Kataria et al., 2024) Internet offload control 61% WAN load drop, P90 latency +12 ms at worst, no QoE loss
VM Migration (Kuribayashi, 2013) Dynamic WAN acceleration 10× energy savings, transfer time drops, sub-5% migration overhead

Each Wan-Move framework leverages domain-specific insights: direct latent editing for fine motion control (video), adaptive routing to exploit public Internet performance–cost trade-offs (networking), and protocol-driven, dynamic acceleration to preserve QoS and power in cloud VM migration.

5. Broader Impact, Limitations, and Application Scenarios

Across domains, Wan-Move solutions advance state-of-the-art control granularity, resource efficiency, and scalability. In video synthesis, the removal of auxiliary encoders simplifies scaling and broadens backbone applicability, while achieving unprecedented motion control fidelity—enabling new applications from multi-object dragging to semantic camera guidance.

In networking, adaptive call offloading offers cloud operators a rigorously validated blueprint for reducing operational costs while retaining strict performance ceilings. A plausible implication is that such methods will see rapid adoption in both private and hybrid-cloud conferencing deployments as WAN costs dominate OPEX.

For cloud computing, dynamically instantiated WAN acceleration optimizes both service performance and energy consumption, accommodating the increasing frequency of distributed VM migrations in response to load, energy price signals, or legal constraints. These findings collectively suggest that the Wan-Move concept is broadly extensible wherever fine-grained, latency-aware, and scalable interventions are needed across distributed digital infrastructure.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Wan-Move.