Papers
Topics
Authors
Recent
2000 character limit reached

CoEx Framework in Vision & Quantum Systems

Updated 3 December 2025
  • CoEx is a dual-domain framework that applies correlation and excitation principles in real-time stereo matching and quantum coherent excitation.
  • In computer vision, it utilizes MobileNetV2 features, multi-scale cost-volume construction, guided excitation, and a top-k soft-argmin for enhanced disparity estimation.
  • In quantum circuits, CoEx models counter-rotating processes enabling a single photon to excite two qubits, driving innovative experiments in circuit QED.

The Correlate-and-Excite (CoEx) framework denotes distinct technical concepts in two separate domains: real-time stereo matching in computer vision and coherent excitation transfer in superconducting quantum circuits. In computer vision, CoEx describes an end-to-end stereo-matching architecture employing image-guided cost-volume excitation and a top-k soft-argmin disparity regression. In quantum physics, CoEx refers to the interaction regime whereby a single photon can simultaneously excite two qubits via counter-rotating terms in a superconducting circuit. Both paradigms leverage "correlation" and "excitation" as core operational themes, albeit under different physical and algorithmic principles.

1. CoEx in Real-Time Stereo Matching: Pipeline and Computational Structure

The stereo-matching CoEx architecture (Bangunharcana et al., 2021) is organized as a three-stage pipeline. It begins with feature extraction from rectified stereo image pairs (L,R)R3×H×W(L, R) \in \mathbb{R}^{3 \times H \times W}, employing a lightweight backbone (MobileNetV2) followed by a U-Net–style upsampling path with skip-connections. Multi-scale feature maps I(s)I^{(s)} are constructed for s{2,3,4,5}s \in \{2,3,4,5\}, corresponding to downsample factors of 4–32.

Cost-volume construction is performed at s=4s=4 (¼ resolution), using correlation between left and right features. Given fL,fRf_L, f_R, the correlation cost volume Ccorr(d,x,y)=fL(,x,y),fR(,xd,y)C^{\text{corr}}(d,x,y) = \langle f_L(\cdot,x,y), f_R(\cdot, x-d, y)\rangle is derived for D=192D=192 disparities, producing CcorrRD×H/4×W/4C^{\text{corr}} \in \mathbb{R}^{D \times H/4 \times W/4}.

Cost-volume aggregation utilizes an hourglass arrangement of 3D convolutions, interleaved with Guided Cost-volume Excitation (GCE) modules (section 2). Disparity regression applies a softmax along the disparity dimension, but instead of the standard soft-argmin, CoEx applies a top-k soft-argmin strategy (section 3). The final disparity estimate is computed at ¼ resolution then upsampled to full via a learned “superpixel” upsampling module.

2. Guided Cost-volume Excitation (GCE) Mechanism

Guided Cost-volume Excitation (GCE) introduces a channel-wise modulation of cost-volume features using 2D left-image features. Let Cin(s)RDs×Hs×Ws×CsC_{\text{in}}^{(s)} \in \mathbb{R}^{D_s \times H_s \times W_s \times C_s} be the cost-volume and I(s)RCI×Hs×WsI^{(s)} \in \mathbb{R}^{C_I \times H_s \times W_s} the corresponding image feature map.

A lightweight guidance network applies a 1×11 \times 1 convolution, producing a channel weight map α(x,y)=σ(WgI(s)(,x,y)+bg)\alpha(x,y) = \sigma(W_g \cdot I^{(s)}(\cdot,x,y) + b_g), where σ\sigma is sigmoid, WgRCs×CIW_g \in \mathbb{R}^{C_s \times C_I}, bgRCsb_g \in \mathbb{R}^{C_s}. The channel excitation occurs by scaling: Cout(s)(d,x,y,c)=α(x,y,c)Cin(s)(d,x,y,c)C_{\text{out}}^{(s)}(d,x,y,c) = \alpha(x,y,c) \cdot C_{\text{in}}^{(s)}(d,x,y,c), which is broadcast across all disparity indices.

This expedient, channel-wise guidance introduces salient image cues at vastly reduced computational complexity compared to spatially-varying neighborhood aggregation.

3. Top-k Soft-Argmin Disparity Regression

Standard disparity regression utilizes a soft-argmin operation over probability volume P(d,x,y)=softmax(Cout(d,x,y))P(d,x,y) = \text{softmax}(-C_{\text{out}}(d,x,y)), yielding d^=d=0D1dP(d,x,y)\hat{d} = \sum_{d=0}^{D-1} d \cdot P(d,x,y). This technique struggles under multi-modal or flat distributions, resulting in a biased average.

CoEx introduces a top-k variant: for each pixel (x,y)(x,y), extract the KK largest elements in Cout(d)-C_{\text{out}}(d), mask others, and re-normalize. The probability distribution becomes:

Ptop(d)={exp(Cout(d))rSexp(Cout(r)),dS 0,otherwiseP_{\text{top}}(d) = \begin{cases} \frac{\exp(-C_{\text{out}}(d))}{\sum_{r\in S} \exp(-C_{\text{out}}(r))}, & d\in S \ 0, & \text{otherwise} \end{cases}

where SS is the set of top-KK indices. Regression is d^(x,y)=dSdPtop(d)\hat{d}(x,y) = \sum_{d \in S} d \cdot P_{\text{top}}(d). Experimental results favor K=2K=2 for optimal trade-off. The filtering effect sharpens the estimate under multi-modal scenarios.

4. Training Protocol and Implementation Details

Pretraining is conducted on the SceneFlow “finalpass” dataset (35,400 train/4,300 test images), with maximum disparity 192 and input crops 576×288576 \times 288. Fine-tuning utilizes KITTI 2012 and 2015 datasets with sparse LiDAR ground truth, employing a split of 90% training and 10% validation.

The per-pixel loss utilizes Smooth L1L_1 between predicted and ground-truth disparity. Optimization employs Adam (β1=0.9\beta_1=0.9, β2=0.999\beta_2=0.999) with Stochastic Weight Averaging (SWA). Learning rate schedules are dataset-specific: SceneFlow (10 epochs, learning rate decay) and KITTI (800 epochs, progressive decay). Data augmentation includes random flips, color jitter, and cropping. The model is implemented in PyTorch and evaluated on RTX 2080Ti hardware; parameter count is \sim2.7 million.

5. Empirical Performance and Ablation Analyses

CoEx achieves leading accuracy for real-time stereo matching, outperforming previous fast architectures. SceneFlow End-Point Error (EPE) is 0.69 (state-of-the-art in the real-time regime), compared to AANet+ at 0.72 and GANet-deep at 0.84. On KITTI datasets, CoEx's metrics (KITTI2012 3px: 1.93%, KITTI2015 D1-all: 2.13%) match or exceed other real-time networks and approach heavier state-of-the-art results.

Runtime benchmarking on RTX 2080Ti yields a total inference time of 27 ms (feature extraction: 10 ms, cost-volume/aggregation/regression: 17 ms). Competing architectures realize substantially higher latencies (AANet+ 80 ms, LEAStereo 475 ms).

Ablation studies report:

  • GCE reduces EPE from 1.05 (correlation-only baseline) to 0.69 when applied at all scales.
  • Channel excitation is more effective than additive skip-links.
  • Versus graph convolution-based spatial aggregation, GCE is both faster and more accurate.
  • Top-kk regression (best at k=2k=2) improves EPE to 0.69 versus 0.85 for full soft-argmin.
  • Training with the correct kk during learning is necessary for gains; switching kk at test does not yield improvement.

6. CoEx in Quantum Circuits: Counter-Rotating Excitation Mechanism

In the quantum circuit context (Wang et al., 2017), CoEx describes coherent processes that allow a single photon to simultaneously excite two qubits via counter-rotating terms in the Hamiltonian. This process is realized in flux qubit systems longitudinally coupled to a resonator, with direct dipole-dipole qubit coupling JJ.

The total Hamiltonian is:

HT=H0+HintH_T = H_0 + H_\text{int}

where H0=ωaa+12Δ1σ1z+12Δ2σ2zH_0 = \omega a^\dagger a + \frac{1}{2}\Delta_1 \sigma_1^z + \frac{1}{2}\Delta_2 \sigma_2^z and Hint=g1σ1z(a+a)+g2σ2z(a+a)+Jσ1xσ2xH_\text{int} = g_1 \sigma_1^z (a + a^\dagger) + g_2 \sigma_2^z(a + a^\dagger) + J \sigma_1^x \sigma_2^x. The crucial counter-rotating term HCR=J(σ1+σ2++σ1σ2)H_\text{CR} = J(\sigma_1^+ \sigma_2^+ + \sigma_1^- \sigma_2^-) survives when moving beyond the rotating-wave approximation.

Under the resonance condition ωΔ1+Δ2\omega \approx \Delta_1 + \Delta_2, a polaron transformation followed by mode selection yields an effective Hamiltonian:

Heff=Gs(aσ1+σ2++aσ1σ2),Gs=2J(β1+β2)H_\text{eff} = G_s (a \sigma_1^+ \sigma_2^+ + a^\dagger \sigma_1^- \sigma_2^-),\quad G_s = 2J(\beta_1 + \beta_2)

with βj=gj/ω\beta_j = g_j / \omega. This tripartite regime permits both adiabatic Landau-Zener sweeps and vacuum Rabi oscillations between states 1,gg|1,gg\rangle and 0,ee|0,ee\rangle. The transition probability and joint-excitation dynamics are directly set by GsG_s.

Critical to this process are interference effects based on the relative strengths and signs of longitudinal couplings g1g_1 and g2g_2. Adjusting coupling polarity results in constructive or destructive interference, which modulates the effective transition amplitude GsG_s.

Experimental schemes demonstrate that CoEx is a pure counter-rotating phenomenon, with all necessary ingredients (moderate coupling, single-photon drive, qubit-resonator readout) accessible with contemporary circuit QED setups.

7. Technical Pseudocode and Structural Summary

The stereo-matching CoEx architecture can be summarized in pseudocode:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Input: cost_in[D,H,W,C], feat_I[C_I,H,W]
α = sigmoid(conv1x1_channel(C_IC)(feat_I)) # α shape = (C,H,W)
cost_out(d,x,y,c) = α(c,x,y) * cost_in(d,x,y,c)
return cost_out

Input: cost_out[D]   # at one (x,y)
neg = -cost_out
S = top_k_indices(neg, K)
weights = zeros(D)
weights[S] = exp(neg[S])
weights = weights / sum(weights)
d_hat = sum_{dS} d * weights[d]
return d_hat

f_L, f_R = FeatureExtract(L, R)
C = CorrelateBuild(f_L, f_R)
for each 3D block i:
    C = 3DConvBlock_i(C)
    C = GCE_i(C, I^{(s_i)})
P = softmax(-C)
d_quarter = top_k_soft_argmin(-C, K=2)
d_full = UpsampleWithSuperpixel(d_quarter, L) # learned 3×3 weighting
return d_full

A plausible implication is that the CoEx paradigm, grounded in the essential operation of correlating representations and then selectively exciting salient channels or states, reflects a broader computational and physical principle: efficient selection and enhancement via guided correlation operates effectively across both data-driven inference tasks and quantum dynamical systems.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Correlate-and-Excite (CoEx).