Papers
Topics
Authors
Recent
Search
2000 character limit reached

HQF-Net: A Hybrid Quantum-Classical Multi-Scale Fusion Network for Remote Sensing Image Segmentation

Published 8 Apr 2026 in cs.CV and cs.AI | (2604.06715v1)

Abstract: Remote sensing semantic segmentation requires models that can jointly capture fine spatial details and high-level semantic context across complex scenes. While classical encoder-decoder architectures such as U-Net remain strong baselines, they often struggle to fully exploit global semantics and structured feature interactions. In this work, we propose HQF-Net, a hybrid quantum-classical multi-scale fusion network for remote sensing image segmentation. HQF-Net integrates multi-scale semantic guidance from a frozen DINOv3 ViT-L/16 backbone with a customized U-Net architecture through a Deformable Multiscale Cross-Attention Fusion (DMCAF) module. To enhance feature refinement, the framework further introduces quantum-enhanced skip connections (QSkip) and a Quantum bottleneck with Mixture-of-Experts (QMoE), which combines complementary local, global, and directional quantum circuits within an adaptive routing mechanism. Experiments on three remote sensing benchmarks show consistent improvements with the proposed design. HQF-Net achieves 0.8568 mIoU and 96.87% overall accuracy on LandCover.ai, 71.82% mIoU on OpenEarthMap, and 55.28% mIoU with 99.37% overall accuracy on SeasoNet. An architectural ablation study further confirms the contribution of each major component. These results show that structured hybrid quantum-classical feature processing is a promising direction for improving remote sensing semantic segmentation under near-term quantum constraints.

Summary

  • The paper proposes a hybrid quantum-classical model integrating a DINOv3 backbone with quantum-enhanced skip connections (QSkip) and a quantum mixture-of-experts bottleneck to improve segmentation.
  • It introduces a deformable multi-scale cross-attention fusion module (DMCAF) that aligns global Transformer features with local U-Net details, achieving significant mIoU gains on benchmarks.
  • The study validates HQF-Net on LandCover.ai, OpenEarthMap, and SeasoNet, demonstrating clear benefits in spatial consistency, boundary precision, and overall accuracy.

HQF-Net: Hybrid Quantum-Classical Multi-Scale Fusion Network for Remote Sensing Image Segmentation

Problem Motivation and Architectural Rationale

HQF-Net addresses the semantic segmentation of remote sensing imagery—a domain characterized by complex multi-scale spatial structures, high intra-class variance, and the necessity for precise boundary delineation. Traditional convolutional encoder-decoder architectures such as U-Net have demonstrated robust performance; however, they remain limited in effectively capturing global semantics and structured feature interactions, particularly in heterogeneous remote sensing contexts. Vision Transformers (ViTs) trained via self-supervised strategies (e.g., DINOv3) provide rich visual representations but lack fine spatial refinement. Quantum Machine Learning (QML) modules, leveraging superposition and entanglement, offer the potential for modeling complex feature interactions in high-dimensional Hilbert spaces, although practical integration is constrained by NISQ hardware limitations.

HQF-Net proposes a hybrid quantum-classical approach: a customized U-Net augmented with multi-scale cross-attention fusion, quantum-enhanced skip connections (QSkip), and a Quantum Mixture-of-Experts bottleneck (QMoE). The model is designed for efficient, structured feature fusion and quantum-assisted refinement, enabled by a synergy of classical and quantum modules under architectural and hardware constraints.

Core Components and Methodological Innovations

HQF-Net integrates several technical innovations to enhance segmentation capacity:

  • Multi-scale Semantic Guidance: The encoder combines a frozen DINOv3 ViT-L/16 backbone (300M parameters) for global semantics with classical feature extraction. Multi-scale intermediate features are fused using a Deformable Multiscale Cross-Attention Fusion (DMCAF) module, which aligns high-level transformer representations with local U-Net features via deformable attention and FiLM-gated residual injection.
  • Quantum-Enhanced Skip Connections (QSkip): Parameterized quantum circuits are used within skip connection pathways, enabling refined channel-wise recalibration based on global quantum feature transformations. Unlike standard squeeze-and-excitation blocks, QSkip leverages quantum descriptors extracted from compressed spatial feature tensors, providing enhanced inter-channel correlation modeling.
  • Quantum Mixture-of-Experts (QMoE) Bottleneck: The encoder output is compressed and encoded into a quantum state—processed through an enrichment multiscale circuit for hierarchical feature extraction. Adaptive routing across three specialized quantum expert circuits (local, global, diagonal) is managed via a classical gating network. Each expert captures complementary spatial, contextual, or directional dependencies, and their outputs are aggregated to form the final bottleneck feature.
  • Classical Decoder with Quantum-Enhanced Feature Fusion: Spatial upsampling and convolutional blocks reconstruct segmentation masks, leveraging quantum-refined skip and bottleneck features. Figure 1

    Figure 1: Overview of Quantum Circuits used in HQF-Net - a) Enrichment Multiscale Circuit, b) Local Circuit, c) Global Circuit, and d) Diagonal Circuit.

Quantum Circuit Design

HQF-Net's quantum operations are specifically architected for dense prediction tasks:

  • Enrichment Multi-Scale Circuit: Models both local spatial dependencies via gridwise entanglement and global contextual mixing.
  • Localist, Globalist, Diagonal Experts: Localist circuits entangle neighboring qubits, Globalist circuits with broad entanglement model non-local patterns, and Diagonal circuits target structured directional relationships. These quantum blocks are implemented via parameterized unitary rotations and CNOT gates.
  • Two-Qubit Quantum Filter: Serves as a core interaction primitive within the local expert circuit. Figure 2

    Figure 2: Two-qubit parameterized quantum filter used as a local interaction block in HQF-Net. The unit models pairwise feature interactions and serves as a basic building block within the larger quantum circuit designs.

Experimental Evaluation

Three challenging remote sensing benchmarks (LandCover.ai, OpenEarthMap, SeasoNet) were used for evaluation. HQF-Net demonstrates strong numerical gains over prior classical and quantum-inspired models:

LandCover.ai: HQF-Net achieves mIoU 0.8568, OA 96.87%. Comparative classical baselines reach maximum mIoU ~0.75 and OA 87.05%; quantum-inspired models are substantially lower, indicating robust improvement.

OpenEarthMap: HQF-Net attains mIoU 71.82%, outperforming transformer-based SegFormer (66.0%) and PyramidMamba with Swin-B backbone (70.8%).

SeasoNet: HQF-Net delivers mIoU 55.28% and OA 99.37%, improving upon SegFormer and DeepLabv3 baselines.

Ablation studies confirm incremental gains from DMCAF fusion (+2-5% mIoU), QSkip (+6% mIoU), and QMoE bottleneck (+6% mIoU); the full model delivers the highest accuracy.

Qualitative Analysis

Qualitative segmentation visualizations reveal HQF-Net’s advantages in spatial consistency, fine boundary preservation, and accurate recognition of narrow structures (e.g., roads, building edges). Figure 3

Figure 3: Qualitative segmentation on the LandCover.ai dataset, showing input images, ground-truth masks, and predictions by HQF-Net.

Comparisons across SeasoNet and OpenEarthMap indicate that HQF-Net produces cleaner object boundaries and reduces misclassification in heterogeneous or densely packed regions. Figure 4

Figure 4: Qualitative segmentation results on the SeasoNet dataset showing original images, ground-truth masks, and comparisons with other models.

Figure 5

Figure 5: Qualitative segmentation results on the OpenEarthMap dataset showing original images, ground-truth masks, and comparisons with other models.

Figure 6

Figure 6: Qualitative segmentation results on the OpenEarthMap and SeasoNet datasets showing original images, ground-truth masks, and HQF-Net predictions.

Theoretical and Practical Implications

  • Hybrid Quantum-Classical Segmentation: HQF-Net exemplifies the practical utility of quantum circuits in dense segmentation pipelines, demonstrating that structured quantum feature refinement can yield consistent gains under current simulation constraints.
  • Model Specialization via Expert Routing: The QMoE architecture leverages sparse specialization, an increasingly prominent paradigm for achieving scalable performance in vision tasks, with added benefit from quantum expert diversity.
  • Quantum Module Design under NISQ Constraints: HQF-Net’s design strategies (feature compression, shallow circuit depth, expert routing) are well-adapted to current quantum hardware limitations, suggesting feasible paths for future deployment as quantum devices mature.

Limitations and Future Directions

  • Quantum Simulation Overhead: Training time remains significantly higher than classical counterparts; deployment on real quantum hardware (NISQ and beyond) is needed for practical scaling.
  • Gradient Flow Stability: Advanced circuit designs require meticulous initialization and differentiation (e.g., adjoint differentiation) to avoid barren plateaus and ensure convergence.
  • Broader Architectural Exploration: The design space for hybrid quantum-classical vision models is expansive; future research should systematically investigate circuit designs, backbone architectures, and attention mechanisms.
  • Application to Larger-Scale Benchmarks and Quantum Self-Supervision: Extending HQF-Net to more diverse and expansive datasets, as well as exploring quantum-native self-supervised feature learning, is a promising direction.

Conclusion

HQF-Net establishes a technically sophisticated hybrid architecture for remote sensing semantic segmentation, integrating DINOv3-guided multi-scale fusion, quantum-enhanced skip refinement, and a structured QMoE bottleneck within a modified U-Net framework. Empirical evaluations across multiple datasets confirm substantial improvements in both accuracy and spatial consistency, supported by incremental gains from each hybrid module. The framework demonstrates the viability of quantum-classical synergy in dense prediction tasks and provides a foundation for future research in both practical deployment and theoretical advancement as quantum hardware capabilities evolve (2604.06715).

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.