Continuous Cost Aggregation (CCA)
- Continuous Cost Aggregation (CCA) is an algorithmic framework that estimates continuous subpixel disparities from dual-pixel sensor data using local parabolic modeling and quadratic cost aggregation.
- The method employs path-wise quadratic aggregation with adaptive smoothness constraints to robustly refine disparity estimates, especially in low-texture or blurred regions.
- CCA leverages a multi-scale fusion strategy, combining coarse-scale priors with fine-scale data, which reduces memory requirements compared to traditional cost-volume approaches.
Continuous Cost Aggregation (CCA) is an algorithmic framework for extracting continuous, subpixel disparities from Dual-Pixel (DP) sensor data, leveraging local parabolic modeling, path-wise quadratic aggregation, and multi-scale coefficient fusion. CCA was introduced to address the challenge posed by DP images’ tiny baseline and non-uniform point spread function (PSF), which preclude conventional stereo matching algorithms from yielding accurate depth information. CCA combines closed-form subpixel disparity estimation within a semi-global matching (SGM) paradigm with efficient propagation of quadratic cost coefficients, enabling pixel-wise minimization without discrete winner-take-all steps and substantially reduced memory requirements relative to classical cost-volume approaches (Monin et al., 2023).
1. Local Parabolic Modeling of Pixelwise Matching Cost
CCA begins with rectified dual-pixel images and computes a discrete per-pixel cost volume for integer disparities , typically via metrics such as Sum of Absolute Differences (SAD) or Normalized Cross-Correlation (NCC). Due to the minute DP baseline, subpixel accuracy is essential. Thus, for each pixel , a parabola is fit locally around the integer cost minimum:
- Identify
- Fit using costs at , ,
- Coefficient calculation:
- Transform back to global disparity to produce where
These parabolic models admit unique minima and allow the curvature to serve as a confidence measure, with flatter parabolas used in ambiguous regions by scaling .
2. Path-wise Quadratic Aggregation and Smoothness Constraint
CCA applies semi-global matching by aggregating quadratic cost functions along scanline directions. For each path , the propagated cost is expressed as a quadratic , combining the local parabola and a quadratic smoothness penalty. At pixel :
- Let the predecessor’s optimum
- Aggregate:
- The coefficients update as:
The adaptive smoothness weight adjusts smoothing strength according to local image gradients, preserving discontinuities at strong edges.
3. Closed-form Subpixel Disparity Extraction
After aggregation over all directions, each pixel has quadratic costs . Summation yields a total quadratic cost:
The subpixel disparity is extracted in closed form by solving for the minimum:
This direct coefficient inversion eliminates the need for discrete label selection and enables efficient pixel-wise minimization.
4. Multi-scale Aggregation and Pyramid Fusion
CCA implements a multi-scale strategy to enhance robustness against depth-dependent defocus blur. Using an image pyramid with scales :
- Run CCA at coarsest scale to obtain coefficient maps , .
- Upsample priors for scale :
- Inject priors with weight before fine-scale CCA:
- Re-run CCA down to full resolution.
In areas with weak texture or blur, the coarse-scale belief steers aggregation; in regions with high local confidence, fine-scale data predominate.
5. Algorithmic Structure
CCA proceeds as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
for s = S down to 1: # 2. compute sparse cost-volume -> local parabolas for each pixel p: compute C_int(p,d) for d in D find d⁰_p = argmin_d C_int(p,d) fit (a_p,b_p,c_p) via 3-point parabola form (α_p,β_p,γ_p) if s<S: α_p += w * A_prior(p) β_p += w * B_prior(p) # 3. multi-iteration, multi-direction aggregation initialize (A_p,B_p,Γ_p)=0 for all p repeat T_s times: for direction r in 1...R: for pixel p along scanline r: if p is first in line: A_prev=α_p; B_prev=β_p else: m = -B_prev/(2*A_prev) P_adapt = P*A_prev*exp[-(I_p-I_{p-1})^2/σ^2] A_curr = α_p + P_adapt B_curr = β_p + P_adapt * (B_prev/A_prev) accumulate: A_p += A_curr B_p += B_curr (A_prev,B_prev) ← (A_curr,B_curr) # optional: normalize (A_p,B_p) # 4. extract subpixel disparity for each p: d*_p = -B_p/(2*A_p) # 5. prepare priors for next scale if s>1: A_prior = Upsample(A_p)/F^2 B_prior = Upsample(B_p)*F |
6. Experimental Protocols and Quantitative Results
CCA has been quantitatively evaluated on several datasets:
- DSLR: Canon DP dataset (Punnappurath et al., ICCP 2020)
- Phone: Google Pixel 2/3 (Garg et al., ICCV 2019)
- Standard Stereo: Middlebury 2014 (quarter-resolution)
Metrics include affine-invariant errors (AI(1), AI(2)), , bad pixel rates at thresholds 0.5, 1, 2 pixels, and RMSE. Key results below:
Table 1. DSLR Results (geometric mean of [AI(1),AI(2),1−|ρ_s|], lower is better)
| Method | AI(1) | AI(2) | 1−|ρ_s| | Geo. Mean | |------------------------|-------|-------|-------|-----------| | SDoF | 0.087 | 0.129 | 0.291 | 0.144 | | DPdisp | 0.047 | 0.074 | 0.082 | 0.065 | | DPE | 0.061 | 0.098 | 0.103 | 0.110 | | CCA | 0.041 | 0.068 | 0.061 | 0.053 | | CCA + filter | 0.036 | 0.061 | 0.049 | 0.048 |
Table 2. Phone (Pixel) Results (same metrics, lower is better)
| Method | AI(1) | AI(2) | 1−|ρ_s| | Geo. Mean | |---------------|-------|-------|-------|-----------| | SDoF | 0.027 | 0.037 | 0.236 | 0.063 | | CCA | 0.026 | 0.036 | 0.225 | 0.059 | | CCA + filter | 0.025 | 0.035 | 0.217 | 0.057 |
Table 3. Middlebury ¼-res Results (non-occluded, lower is better)
| Method | bad<0.5 px | bad<1 px | bad<2 px | RMSE |
|---|---|---|---|---|
| SGM | 26.1 % | 17.2 % | 12.2 % | 9.90 |
| CCA | 26.2 % | 18.3 % | 13.2 % | 5.20 |
| SGM + filter | 23.5 % | 15.2 % | 10.5 % | 4.04 |
| CCA + filter | 24.6 % | 16.7 % | 11.6 % | 4.04 |
CCA delivers continuous, subpixel disparities in closed form and operates with time and memory, circumventing the need for full cost-volume storage. On DP images, the method surpasses prior learning and non-learning baselines; on standard stereo, it is comparable to Semi-Global Matching (SGM), with quantitative advantages in memory efficiency.
7. Context and Implications
CCA addresses specific limitations of DP disparity extraction, namely sensitivity to PSF variation and the infeasibility of conventional stereo matching due to small disparities. The quadratic form persists throughout coefficient propagation, enabling closed-form minimization and robust multi-scale fusion. A plausible implication is that CCA’s framework could generalize to other continuous-label matching tasks where cost functions are locally convex and aggregatable under quadratic constraints. Its memory and time efficiency offer practical advantages for embedded or real-time applications. CCA’s competitive performance on standard stereo benchmarks (Middlebury) with a reduced computational footprint demonstrates its potential for broader adoption across passive depth sensing modalities (Monin et al., 2023).