OptFusion: Optimal & Automated Data Fusion

Updated 25 March 2026

OptFusion is a family of methodologies that fuse multimodal data using optimal transport and gradient-based search to achieve enhanced accuracy and robustness.
It employs techniques like automated architecture search, OT-based alignment, and tensorized optical self-attention to outperform traditional fusion methods.
Applications span CTR prediction, quantum computing, and 3D reconstruction, yielding measurable gains in performance, energy efficiency, and real-time processing.

OptFusion denotes a family of methodologies and system architectures for optimal or automated fusion of multimodal data, neural network models, and sensor measurements, with applications ranging from deep learning and optical quantum computing to telecommunications and 3D reconstruction. Across its diverse incarnations, OptFusion is unified by the use of data-driven or optimal transport-based algorithms that systematically search, align, or compute joint representations for fusion tasks, yielding improved task accuracy, robustness, and interpretability compared to ad hoc or fixed fusion strategies.

1. Automated Fusion Learning in Deep Neural Networks

OptFusion in deep learning contexts refers to an automated framework for learning both the topology (“which blocks to fuse?”) and the operations (“how to fuse?”) within deep models—most notably for click-through rate (CTR) prediction. Rather than fixing the fusion scheme as in stacked or parallel baselines, OptFusion parameterizes the network as a directed acyclic graph (DAG) of shallow and deep feature-extracting blocks, and exposes a gradient-optimized search space over both connection structure and fusion operators (ADD, PROD, CONCAT, ATT).

Optimization is achieved via a single-level, joint one-shot algorithm. Binary-valued architecture variables $\{\alpha_{ij}\}$ select connections through a straight-through estimator, while fusion operation selection per block is parameterized by softmaxed weights $\{\beta_j^o\}$ . The search alternates between learning regular parameters and updating $\{\alpha,\beta\}$ , followed by extracting a concrete, pruned architecture for retraining and inference. OptFusion consistently outperforms handcrafted and neural architecture search (NAS) baselines in CTR prediction, yielding $+0.1$ – $0.4\%$ AUC gains over expert fusion networks (e.g., EDCN) and executing full search plus retrain in under $5$ hours on 46M-sample datasets (Zhang et al., 2024).

Quantitative Results

Dataset	OptFusion-Soft AUC	Best Baseline AUC
Criteo	0.8113	0.8102 (EDCN)
Avazu	0.7938	0.7917 (EDCN)
KDD12	0.8158	0.8122 (EDCN)

Key design ablations reveal that flexible, block-specific fusion operation choice and joint architecture/operation search drive these improvements.

2. Model Fusion via Optimal Transport

Multiple OptFusion methodologies formalize the fusion of neural network weights or features by posing layer-wise or component-wise alignment as an optimal transport (OT) problem. This approach addresses the permutation symmetry of neural architectures, enabling principled fusion beyond trivial parameter averaging.

The canonical pipeline aligns neurons, heads, or architectural sub-blocks across independently trained models by solving a Kantorovich or entropically-regularized OT problem for each layer, yielding a soft alignment plan that is then used to permute and barycentrically average the weights. This guarantees “one-shot” knowledge transfer, with or without further fine-tuning, and allows fusion for both identical and differing layer widths, as well as for heterogeneous model sizes (e.g., ViT, BERT, or multi-layer perceptrons) (Singh et al., 2019, Imfeld et al., 2023).

For continual task-specific fusion, as in OTMF (Optimal Transport-based Masked Fusion), additional masking mechanisms discover shared and task-unique parameter subspaces, enabling sequential, memory-bounded aggregation of models without catastrophic forgetting. The OT metric is empirically minimized between the feature space distributions of merged and constituent models, using Sinkhorn iterations (Pan et al., 24 Nov 2025).

Table: Key OT-based Model Fusion Variants

Method	Alignment Target	Application Domain
OT-neuronwise	Layer neuron activations	MLPs, CNNs (CIFAR, MNIST)
OT-headwise	Attention heads, MLM layers	Transformers (ViT, BERT)
OTMF	Feature distributions + masks	Continual multi-task

Empirical results demonstrate that OT-aligned fusion recovers near-ensemble accuracy (or improves upon parents after fast tuning) with lower computational and memory costs.

3. Optimal Transport Fusion for Multimodal and Cross-model Tasks

In transductive zero-shot learning, OptFusion bridges independently pretrained vision-LLMs (VLMs) and vision-only foundation models (VFMs) by aligning the distributions of visual and semantic features using an entropy-regularized OT framework. This process produces a fused joint assignment matrix via Sinkhorn-Knopp iterations over a reward matrix that balances feature-based soft posteriors and VLM-derived semantic probabilities.

For $N$ test images $x_i$ and $C$ class prototypical descriptions $s_j$ , features from VFM and VLM backbones are extracted, distributions over classes constructed (GMM posteriors and softmaxed cosine-similarity logits), and then combined in a single OT maximization or cost minimization. Final predictions are made as $\hat y_i = \arg\max_j Q_{ij}$ , where $Q$ is the optimal assignment matrix (Xu et al., 16 Jun 2025).

Benchmark	CLIP-only	DINOv2-only	OTFusion
Avg Top-1 (%)	64.78	74.20	74.95

A direct $+10\%$ improvement over CLIP-only, without fine-tuning and using only inference-time OT.

4. Multimodal Fusion in Optical, Quantum, and Sensor Systems

OptFusion also encompasses physical multimodal fusion, including:

Tensorized Optical Multimodal Fusion Networks: Low-rank tensor fusion (LMF) modules and feed-forward optical self-attention enable photonic integration of vision, audio, and text for sentiment classification, realized via photonic neural network (TONN) cores and Mach–Zehnder interferometer meshes (Zhao et al., 2023). OptFusion achieves $51.3\times$ lower hardware demand and $3.7\times10^{13}$ MAC/J energy efficiency vs. classical baselines.
Quantum OptFusion: In measurement-based linear optical quantum computation, OptFusion generalizes Type-II fusion to arbitrary $d$ -dimensional qudits. The protocol, based on Fourier projections and ancillary states engineered by time-bin multiplexing in silicon spin qudits, achieves postselected Bell measurement success probability $P\approx 2/d^2$ (even $d$ ), outperforming prior constructions by orders of magnitude (e.g., $d=5$ : $P=0.067$ vs. $9.2\times10^{-5}$ ), with concrete hardware proposals for ancilla synthesis (Üstün et al., 22 May 2025).
Network Telemetry Fusion: In software-defined optical networks, OptFusion denotes a data fusion telemetry layer stratified into source, space, and model-level engines, leveraging analytical models, heterogeneous ML, and federated learning. This enables real-time, reduced-bandwidth key performance indicator (KPI) extraction (e.g., ~10 $\times$ data volume reduction), robust quality-of-transmission estimation, and closed-loop control (Liu et al., 2020).

In microscopy, OpticFusion uses neural implicit surfaces (SDF) to fuse data from white light interferometry (WLI, precise but textureless) and optical microscopy (OM, textured but lacking precise z-depth), producing detailed, color-realistic 3D reconstructions. Calibration uses ICP-based intra- and inter-modal alignment, and joint supervision uses color photometric loss on OM rays and depth loss on WLI rays.

The final model separates view-independent (“diffuse”) from view-dependent (“specular”) color components, with only the former presented in the 3D mesh. OpticFusion achieves synthetic Chamfer Distance as low as 0.043 (vs. 0.793 for traditional Poisson on WLI only), and fills WLI voids with OM-informed geometry, supporting sub-nanometer accuracy with natural textures (Chen et al., 16 Jan 2025).

6. Comparative Summary and Key Outcomes

OptFusion methods distinguish themselves across application domains via:

Principled fusion learning: Gradient-based architecture and connection search (deep CTR prediction), data fusion layers in telecom networks, and low-rank or tensorized multimodal integration in optics.
Optimal alignment: Soft, data-driven matching of latent feature or parameter spaces via optimal transport, outperforming naïve merging or fixed reference alignments in model fusion tasks.
Hardware and efficiency advantages: Photonic and quantum optical systems leverage OptFusion protocols for hardware-efficient, high-fidelity entangled-state generation, or energy-efficient deep learning inference.
Empirical validation: Consistent accuracy improvements in model fusion (+10% or greater over baselines in ZSL; $+0.1$ – $0.4\%$ in CTR), memory and compute reductions (51× to 92× over classical networks), and higher success probabilities (up to 723× in prime- $d$ quantum fusion).

These results underscore the technical merits of OptFusion’s data-driven, optimal, or automated fusion paradigm as a foundational approach to multimodal, multicomponent, and cross-model integration across deep learning, quantum information, optics, and communications.