CoEx Framework in Vision & Quantum Systems
- CoEx is a dual-domain framework that applies correlation and excitation principles in real-time stereo matching and quantum coherent excitation.
- In computer vision, it utilizes MobileNetV2 features, multi-scale cost-volume construction, guided excitation, and a top-k soft-argmin for enhanced disparity estimation.
- In quantum circuits, CoEx models counter-rotating processes enabling a single photon to excite two qubits, driving innovative experiments in circuit QED.
The Correlate-and-Excite (CoEx) framework denotes distinct technical concepts in two separate domains: real-time stereo matching in computer vision and coherent excitation transfer in superconducting quantum circuits. In computer vision, CoEx describes an end-to-end stereo-matching architecture employing image-guided cost-volume excitation and a top-k soft-argmin disparity regression. In quantum physics, CoEx refers to the interaction regime whereby a single photon can simultaneously excite two qubits via counter-rotating terms in a superconducting circuit. Both paradigms leverage "correlation" and "excitation" as core operational themes, albeit under different physical and algorithmic principles.
1. CoEx in Real-Time Stereo Matching: Pipeline and Computational Structure
The stereo-matching CoEx architecture (Bangunharcana et al., 2021) is organized as a three-stage pipeline. It begins with feature extraction from rectified stereo image pairs , employing a lightweight backbone (MobileNetV2) followed by a U-Net–style upsampling path with skip-connections. Multi-scale feature maps are constructed for , corresponding to downsample factors of 4–32.
Cost-volume construction is performed at (¼ resolution), using correlation between left and right features. Given , the correlation cost volume is derived for disparities, producing .
Cost-volume aggregation utilizes an hourglass arrangement of 3D convolutions, interleaved with Guided Cost-volume Excitation (GCE) modules (section 2). Disparity regression applies a softmax along the disparity dimension, but instead of the standard soft-argmin, CoEx applies a top-k soft-argmin strategy (section 3). The final disparity estimate is computed at ¼ resolution then upsampled to full via a learned “superpixel” upsampling module.
2. Guided Cost-volume Excitation (GCE) Mechanism
Guided Cost-volume Excitation (GCE) introduces a channel-wise modulation of cost-volume features using 2D left-image features. Let be the cost-volume and the corresponding image feature map.
A lightweight guidance network applies a convolution, producing a channel weight map , where is sigmoid, , . The channel excitation occurs by scaling: , which is broadcast across all disparity indices.
This expedient, channel-wise guidance introduces salient image cues at vastly reduced computational complexity compared to spatially-varying neighborhood aggregation.
3. Top-k Soft-Argmin Disparity Regression
Standard disparity regression utilizes a soft-argmin operation over probability volume , yielding . This technique struggles under multi-modal or flat distributions, resulting in a biased average.
CoEx introduces a top-k variant: for each pixel , extract the largest elements in , mask others, and re-normalize. The probability distribution becomes:
where is the set of top- indices. Regression is . Experimental results favor for optimal trade-off. The filtering effect sharpens the estimate under multi-modal scenarios.
4. Training Protocol and Implementation Details
Pretraining is conducted on the SceneFlow “finalpass” dataset (35,400 train/4,300 test images), with maximum disparity 192 and input crops . Fine-tuning utilizes KITTI 2012 and 2015 datasets with sparse LiDAR ground truth, employing a split of 90% training and 10% validation.
The per-pixel loss utilizes Smooth between predicted and ground-truth disparity. Optimization employs Adam (, ) with Stochastic Weight Averaging (SWA). Learning rate schedules are dataset-specific: SceneFlow (10 epochs, learning rate decay) and KITTI (800 epochs, progressive decay). Data augmentation includes random flips, color jitter, and cropping. The model is implemented in PyTorch and evaluated on RTX 2080Ti hardware; parameter count is 2.7 million.
5. Empirical Performance and Ablation Analyses
CoEx achieves leading accuracy for real-time stereo matching, outperforming previous fast architectures. SceneFlow End-Point Error (EPE) is 0.69 (state-of-the-art in the real-time regime), compared to AANet+ at 0.72 and GANet-deep at 0.84. On KITTI datasets, CoEx's metrics (KITTI2012 3px: 1.93%, KITTI2015 D1-all: 2.13%) match or exceed other real-time networks and approach heavier state-of-the-art results.
Runtime benchmarking on RTX 2080Ti yields a total inference time of 27 ms (feature extraction: 10 ms, cost-volume/aggregation/regression: 17 ms). Competing architectures realize substantially higher latencies (AANet+ 80 ms, LEAStereo 475 ms).
Ablation studies report:
- GCE reduces EPE from 1.05 (correlation-only baseline) to 0.69 when applied at all scales.
- Channel excitation is more effective than additive skip-links.
- Versus graph convolution-based spatial aggregation, GCE is both faster and more accurate.
- Top- regression (best at ) improves EPE to 0.69 versus 0.85 for full soft-argmin.
- Training with the correct during learning is necessary for gains; switching at test does not yield improvement.
6. CoEx in Quantum Circuits: Counter-Rotating Excitation Mechanism
In the quantum circuit context (Wang et al., 2017), CoEx describes coherent processes that allow a single photon to simultaneously excite two qubits via counter-rotating terms in the Hamiltonian. This process is realized in flux qubit systems longitudinally coupled to a resonator, with direct dipole-dipole qubit coupling .
The total Hamiltonian is:
where and . The crucial counter-rotating term survives when moving beyond the rotating-wave approximation.
Under the resonance condition , a polaron transformation followed by mode selection yields an effective Hamiltonian:
with . This tripartite regime permits both adiabatic Landau-Zener sweeps and vacuum Rabi oscillations between states and . The transition probability and joint-excitation dynamics are directly set by .
Critical to this process are interference effects based on the relative strengths and signs of longitudinal couplings and . Adjusting coupling polarity results in constructive or destructive interference, which modulates the effective transition amplitude .
Experimental schemes demonstrate that CoEx is a pure counter-rotating phenomenon, with all necessary ingredients (moderate coupling, single-photon drive, qubit-resonator readout) accessible with contemporary circuit QED setups.
7. Technical Pseudocode and Structural Summary
The stereo-matching CoEx architecture can be summarized in pseudocode:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
Input: cost_in[D,H,W,C], feat_I[C_I,H,W] α = sigmoid(conv1x1_channel(C_I→C)(feat_I)) # α shape = (C,H,W) cost_out(d,x,y,c) = α(c,x,y) * cost_in(d,x,y,c) return cost_out Input: cost_out[D] # at one (x,y) neg = -cost_out S = top_k_indices(neg, K) weights = zeros(D) weights[S] = exp(neg[S]) weights = weights / sum(weights) d_hat = sum_{d∈S} d * weights[d] return d_hat f_L, f_R = FeatureExtract(L, R) C = CorrelateBuild(f_L, f_R) for each 3D block i: C = 3DConvBlock_i(C) C = GCE_i(C, I^{(s_i)}) P = softmax(-C) d_quarter = top_k_soft_argmin(-C, K=2) d_full = UpsampleWithSuperpixel(d_quarter, L) # learned 3×3 weighting return d_full |
A plausible implication is that the CoEx paradigm, grounded in the essential operation of correlating representations and then selectively exciting salient channels or states, reflects a broader computational and physical principle: efficient selection and enhancement via guided correlation operates effectively across both data-driven inference tasks and quantum dynamical systems.