VL-PUB Module Overview
- VL-PUB Module is the publisher-side computation unit in the PubSub-VFL framework that enables privacy-preserving vertical federated learning through split learning.
- It employs asynchronous embedding publication, Gaussian differential privacy noise injection, and gradient backpropagation with synchronized parameter updates to handle heterogeneous resources.
- Empirical results demonstrate a 2–7× training speedup and high resource utilization while ensuring linear convergence under strict privacy guarantees.
The VL-PUB module is the publisher-side orchestration and computation unit in the PubSub-VFL framework for two-party vertical federated learning (VFL). Engineered for efficient, privacy-preserving split learning between organizations with heterogeneous resources, VL-PUB operationalizes asynchronous embedding publication, local differential privacy (DP) protection, gradient backpropagation, and synchronized parameter updates. Achieving substantial resource utilization and training acceleration, it forms the “passive party” (denoted ) counterpart to the subscriber, functioning as both an independent compute engine and a tightly integrated publisher node in the Publisher/Subscriber (Pub/Sub) layering of PubSub-VFL (Liu et al., 14 Oct 2025).
1. Architectural and Functional Overview
VL-PUB executes on the side of the passive participant in a two-party VFL arrangement. Its duties include:
- Sampling local feature minibatches and computing embeddings using the bottom model.
- Applying Gaussian DP noise: , where .
- Publishing these noisy embeddings into an embedding channel keyed by batch identifier.
- On receipt of the corresponding embedding gradient from the subscriber, performing backpropagation to obtain parameter gradients and pushing updated parameters to the local Parameter Server (PS).
The module interfaces asynchronously with both the VL-SUB (“active” party) via embedding () and gradient (0) channels, and with the local PS via scheduled aggregation/broadcast. All updates and synchronizations accommodate data and system heterogeneity, bounded staleness, and strict privacy constraints.
Data and Gradient Workflow
- VL-PUB samples batch 1.
- Computes 2.
- Perturbs with Gaussian DP noise: 3.
- Publishes 4 to embedding channel 5.
- After VL-SUB computes 6, pulls gradient from 7.
- Backpropagates through 8, updates 9, pushes to local PS.
- At semi-async intervals, PS aggregates local model copies and rebroadcasts fresh parameters.
Diagrammatic flow:
0
2. Hierarchical Asynchronous Update Logic
VL-PUB supports a two-level asynchronous paradigm:
- Pub/Sub Asynchrony (Cross-Party): Embedding and gradient exchanges between publisher and subscriber occur via decoupled, FIFO Pub/Sub channels.
- Semi-Asynchronous PS Updates (Within-Party): Local PS aggregates worker parameter updates every 1 steps, broadcasting the average to all party workers.
A key pseudo-code summary for the VL-PUB worker loop:
4
Update Equations:
2
where 3 (maximum staleness) and 4 is DP noise.
Staleness Bound:
5
3. Heterogeneity-Aware Optimization Problem
VL-PUB is engineered to adaptively optimize performance under resource and data heterogeneity by formalizing and solving a discrete minimax latency problem:
Objective:
6
subject to DP privacy, memory, and resource constraints.
- 7: end-to-end per-batch time for subscriber, includes forward and backward passes, top model, and gradient comm.
- 8: end-to-end per-batch time for publisher, includes forward and backward passes, and embedding comm.
Subcomponent Latencies:
9
(similar expressions for active side).
Communication times: 0
Privacy Constraint:
1
for 2-GDP.
Dynamic Programming Solution:
3
Selects optimal 4 via 5.
4. Convergence and Privacy Guarantees
Convergence of the VL-PUB module (and thus the overall PubSub-VFL framework) is rigorously characterized:
Theorem (5.1 (Liu et al., 14 Oct 2025), under standard convexity, smoothness, bounded variance, staleness, DP):
6
Thus, the process achieves linear convergence up to a variance floor dependent on the sum of SGD and DP noises.
DP compatibility follows from the independence and zero-mean property of injected GDP noise; the only consequence is a higher asymptotic variance. Convergence and privacy results remain intact for small enough stepsize 7, even under bounded staleness.
5. Empirical Acceleration and Resource Utilization
VL-PUB yields substantial practical speedup and efficient hardware utilization:
| Method | Time (s) | Speedup | CPU Util (%) |
|---|---|---|---|
| AVFL-PS | 885.01 | 1.0× | 76.2 |
| PubSub-VFL | 124.01 | 7.14× | 89.97 |
Across five benchmark datasets, PubSub-VFL (with VL-PUB) achieved 8 speedup and CPU utilization up to 91.07%, with comparable or improved test accuracy (Liu et al., 14 Oct 2025).
Per-batch computational costs at VL-PUB:
- Forward: 9
- Backward: similar
- Communication: Transmit/receive 0 floats
- PS aggregation: 1 every 2 steps
6. Implementation and Integration
Basic Integration:
- VL-PUB functions as an asynchronous publisher within any VFL codebase.
- Each worker requires:
- Pub/Sub client for embedding (3) and gradient (4) channels
- Small FIFO buffer (5)
- Waiting-deadline (6)
- DP noise generator (7 calibrated via Eq. (25))
- Semi-async PS-sync interval (8)
Hyperparameter Selection:
- Synchronous profiling round for 9.
- Solve the DP or use provided dynamic programming code for optimal 0. Defaults: 1.
- Monitor channel latencies; adjust 2 or buffer sizes as needed.
- Re-solve DP if data imbalance emerges.
VL-PUB thereby implements lightweight yet robust asynchronous publication, gradient consumption, and parameter updates with bounded staleness, strict privacy via GDP, and explicit resource heterogeneity adaptation. It achieves notable end-to-end acceleration (3) and near-optimal hardware efficiency (up to 91.07% CPU utilization) in empirical testing (Liu et al., 14 Oct 2025).