Papers
Topics
Authors
Recent
Search
2000 character limit reached

VL-PUB Module Overview

Updated 7 February 2026
  • VL-PUB Module is the publisher-side computation unit in the PubSub-VFL framework that enables privacy-preserving vertical federated learning through split learning.
  • It employs asynchronous embedding publication, Gaussian differential privacy noise injection, and gradient backpropagation with synchronized parameter updates to handle heterogeneous resources.
  • Empirical results demonstrate a 2–7× training speedup and high resource utilization while ensuring linear convergence under strict privacy guarantees.

The VL-PUB module is the publisher-side orchestration and computation unit in the PubSub-VFL framework for two-party vertical federated learning (VFL). Engineered for efficient, privacy-preserving split learning between organizations with heterogeneous resources, VL-PUB operationalizes asynchronous embedding publication, local differential privacy (DP) protection, gradient backpropagation, and synchronized parameter updates. Achieving substantial resource utilization and training acceleration, it forms the “passive party” (denoted PpP_p) counterpart to the subscriber, functioning as both an independent compute engine and a tightly integrated publisher node in the Publisher/Subscriber (Pub/Sub) layering of PubSub-VFL (Liu et al., 14 Oct 2025).

1. Architectural and Functional Overview

VL-PUB executes on the side of the passive participant PpP_p in a two-party VFL arrangement. Its duties include:

  • Sampling local feature minibatches {xip}iB\{x_i^p\}_{i\in B} and computing embeddings zp=fp(xp;θp)z^p = f_p(x^p;\theta_p) using the bottom model.
  • Applying Gaussian DP noise: zpzp+ξdpz^p \leftarrow z^p + \xi_{dp}, where ξdpN(0,σdp2I)\xi_{dp} \sim \mathcal{N}(0,\sigma_{dp}^2I).
  • Publishing these noisy embeddings into an embedding channel Ce[b]\mathcal{C}_e[b] keyed by batch identifier.
  • On receipt of the corresponding embedding gradient zpL\nabla_{z^p}\mathcal{L} from the subscriber, performing backpropagation to obtain parameter gradients θpL\nabla_{\theta_p}\mathcal{L} and pushing updated parameters to the local Parameter Server (PS).

The module interfaces asynchronously with both the VL-SUB (“active” party) via embedding (Ce\mathcal{C}_e) and gradient (PpP_p0) channels, and with the local PS via scheduled aggregation/broadcast. All updates and synchronizations accommodate data and system heterogeneity, bounded staleness, and strict privacy constraints.

Data and Gradient Workflow

  1. VL-PUB samples batch PpP_p1.
  2. Computes PpP_p2.
  3. Perturbs with Gaussian DP noise: PpP_p3.
  4. Publishes PpP_p4 to embedding channel PpP_p5.
  5. After VL-SUB computes PpP_p6, pulls gradient from PpP_p7.
  6. Backpropagates through PpP_p8, updates PpP_p9, pushes to local PS.
  7. At semi-async intervals, PS aggregates local model copies and rebroadcasts fresh parameters.

Diagrammatic flow:

{xip}iB\{x_i^p\}_{i\in B}0

2. Hierarchical Asynchronous Update Logic

VL-PUB supports a two-level asynchronous paradigm:

  • Pub/Sub Asynchrony (Cross-Party): Embedding and gradient exchanges between publisher and subscriber occur via decoupled, FIFO Pub/Sub channels.
  • Semi-Asynchronous PS Updates (Within-Party): Local PS aggregates worker parameter updates every {xip}iB\{x_i^p\}_{i\in B}1 steps, broadcasting the average to all party workers.

A key pseudo-code summary for the VL-PUB worker loop:

ξdpN(0,σdp2I)\xi_{dp} \sim \mathcal{N}(0,\sigma_{dp}^2I)4

Update Equations:

{xip}iB\{x_i^p\}_{i\in B}2

where {xip}iB\{x_i^p\}_{i\in B}3 (maximum staleness) and {xip}iB\{x_i^p\}_{i\in B}4 is DP noise.

Staleness Bound:

{xip}iB\{x_i^p\}_{i\in B}5

3. Heterogeneity-Aware Optimization Problem

VL-PUB is engineered to adaptively optimize performance under resource and data heterogeneity by formalizing and solving a discrete minimax latency problem:

Objective:

{xip}iB\{x_i^p\}_{i\in B}6

subject to DP privacy, memory, and resource constraints.

  • {xip}iB\{x_i^p\}_{i\in B}7: end-to-end per-batch time for subscriber, includes forward and backward passes, top model, and gradient comm.
  • {xip}iB\{x_i^p\}_{i\in B}8: end-to-end per-batch time for publisher, includes forward and backward passes, and embedding comm.

Subcomponent Latencies:

{xip}iB\{x_i^p\}_{i\in B}9

(similar expressions for active side).

Communication times: zp=fp(xp;θp)z^p = f_p(x^p;\theta_p)0

Privacy Constraint:

zp=fp(xp;θp)z^p = f_p(x^p;\theta_p)1

for zp=fp(xp;θp)z^p = f_p(x^p;\theta_p)2-GDP.

Dynamic Programming Solution:

zp=fp(xp;θp)z^p = f_p(x^p;\theta_p)3

Selects optimal zp=fp(xp;θp)z^p = f_p(x^p;\theta_p)4 via zp=fp(xp;θp)z^p = f_p(x^p;\theta_p)5.

4. Convergence and Privacy Guarantees

Convergence of the VL-PUB module (and thus the overall PubSub-VFL framework) is rigorously characterized:

Theorem (5.1 (Liu et al., 14 Oct 2025), under standard convexity, smoothness, bounded variance, staleness, DP):

zp=fp(xp;θp)z^p = f_p(x^p;\theta_p)6

Thus, the process achieves linear convergence up to a variance floor dependent on the sum of SGD and DP noises.

DP compatibility follows from the independence and zero-mean property of injected GDP noise; the only consequence is a higher asymptotic variance. Convergence and privacy results remain intact for small enough stepsize zp=fp(xp;θp)z^p = f_p(x^p;\theta_p)7, even under bounded staleness.

5. Empirical Acceleration and Resource Utilization

VL-PUB yields substantial practical speedup and efficient hardware utilization:

Method Time (s) Speedup CPU Util (%)
AVFL-PS 885.01 1.0× 76.2
PubSub-VFL 124.01 7.14× 89.97

Across five benchmark datasets, PubSub-VFL (with VL-PUB) achieved zp=fp(xp;θp)z^p = f_p(x^p;\theta_p)8 speedup and CPU utilization up to 91.07%, with comparable or improved test accuracy (Liu et al., 14 Oct 2025).

Per-batch computational costs at VL-PUB:

  • Forward: zp=fp(xp;θp)z^p = f_p(x^p;\theta_p)9
  • Backward: similar
  • Communication: Transmit/receive zpzp+ξdpz^p \leftarrow z^p + \xi_{dp}0 floats
  • PS aggregation: zpzp+ξdpz^p \leftarrow z^p + \xi_{dp}1 every zpzp+ξdpz^p \leftarrow z^p + \xi_{dp}2 steps

6. Implementation and Integration

Basic Integration:

  • VL-PUB functions as an asynchronous publisher within any VFL codebase.
  • Each worker requires:
    • Pub/Sub client for embedding (zpzp+ξdpz^p \leftarrow z^p + \xi_{dp}3) and gradient (zpzp+ξdpz^p \leftarrow z^p + \xi_{dp}4) channels
    • Small FIFO buffer (zpzp+ξdpz^p \leftarrow z^p + \xi_{dp}5)
    • Waiting-deadline (zpzp+ξdpz^p \leftarrow z^p + \xi_{dp}6)
    • DP noise generator (zpzp+ξdpz^p \leftarrow z^p + \xi_{dp}7 calibrated via Eq. (25))
    • Semi-async PS-sync interval (zpzp+ξdpz^p \leftarrow z^p + \xi_{dp}8)

Hyperparameter Selection:

  • Synchronous profiling round for zpzp+ξdpz^p \leftarrow z^p + \xi_{dp}9.
  • Solve the DP or use provided dynamic programming code for optimal ξdpN(0,σdp2I)\xi_{dp} \sim \mathcal{N}(0,\sigma_{dp}^2I)0. Defaults: ξdpN(0,σdp2I)\xi_{dp} \sim \mathcal{N}(0,\sigma_{dp}^2I)1.
  • Monitor channel latencies; adjust ξdpN(0,σdp2I)\xi_{dp} \sim \mathcal{N}(0,\sigma_{dp}^2I)2 or buffer sizes as needed.
  • Re-solve DP if data imbalance emerges.

VL-PUB thereby implements lightweight yet robust asynchronous publication, gradient consumption, and parameter updates with bounded staleness, strict privacy via GDP, and explicit resource heterogeneity adaptation. It achieves notable end-to-end acceleration (ξdpN(0,σdp2I)\xi_{dp} \sim \mathcal{N}(0,\sigma_{dp}^2I)3) and near-optimal hardware efficiency (up to 91.07% CPU utilization) in empirical testing (Liu et al., 14 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to VL-PUB Module.