Papers
Topics
Authors
Recent
Search
2000 character limit reached

Coded-Aperture Dual-Pixel Sensing (CADS)

Updated 10 March 2026
  • CADS is an imaging technique that integrates a learnable coded aperture with dual-pixel sensors to capture both color and depth in a single shot.
  • It employs an end-to-end optimization pipeline with a U-Net-based CADNet, jointly learning the coded mask and neural reconstruction for improved image fidelity and depth accuracy.
  • Experimental evaluations demonstrate that CADS outperforms conventional methods, achieving superior PSNR, depth MAE, and robustness across various optical setups and platforms.

Coded-Aperture Dual-Pixel Sensing (CADS) is an imaging methodology designed for passive, compact, and single-shot acquisition of RGB-D (color + depth) scene information, particularly targeting applications where form factor, speed, and power constraints are paramount. CADS augments the standard dual-pixel (DP) sensor design—a sensor that effectively captures two horizontally offset sub-images per point via pixel-level micro-lens splitting—with a learnable coded aperture mask. Through end-to-end optimization, both the coded mask and a neural reconstruction network are co-adapted to maximize all-in-focus (AIF) image fidelity and depth estimation precision for a broad class of scenes and optical setups (Ghanekar et al., 2024).

1. Optical and Computational Imaging Model

The CADS system introduces a coded amplitude mask c(u)c(u) at the aperture plane of the imaging lens, modulating the scene radiance f(u)f(u) at pupil coordinate uu. The DP sensor splits each pixel into left/right halves, producing two defocus point spread functions (PSFs) HL(x,u;z)H^{L}(x, u; z) and HR(x,u;z)H^{R}(x, u; z) that encode depth zz via disparity and blur. The continuous-space forward model for the left and right DP sub-images is:

yL(x)=∫HL(x,u;z)c(u)f(u) du+nL(x)y_{L}(x) = \int H^{L}(x, u; z) c(u) f(u) \, du + n_{L}(x)

yR(x)=∫HR(x,u;z)c(u)f(u) du+nR(x)y_{R}(x) = \int H^{R}(x, u; z) c(u) f(u) \, du + n_{R}(x)

where nLn_L, nRn_R model heteroscedastic sensor noise.

In practice, the scene is discretized into KK depth planes {zk}\{z_k\}, approximated by spatial per-plane intensity maps szk(x)s_{z_k}(x). Precomputed PSF stacks hzkL,C(x)h_{z_k}^{L, C}(x), hzkR,C(x)h_{z_k}^{R, C}(x) incorporate the coded mask. The discrete forward model is:

IL(x)=∑k=1K[hzkL,C∗szk](x)+nL(x)I_L(x) = \sum_{k=1}^K \left[h^{L, C}_{z_k} * s_{z_k}\right](x) + n_L(x)

IR(x)=∑k=1K[hzkR,C∗szk](x)+nR(x)I_R(x) = \sum_{k=1}^K \left[h^{R, C}_{z_k} * s_{z_k}\right](x) + n_R(x)

The circle-of-confusion (CoC) diameter relating blur and disparity to object depth is

D(z)=Lf1−f/g(1g−1z)D(z) = \frac{L f}{1 - f/g} \left(\frac{1}{g} - \frac{1}{z}\right)

where LL is lens diameter, ff focal length, gg in-focus distance. Normalized disparity d(z)≃D(z)/pd(z) \simeq D(z)/p is the primary depth cue for CADNet.

2. End-to-End Optimization of Coded Masks and Reconstruction

CADS jointly learns both the coded aperture mask and the neural reconstruction algorithm in a differentiable pipeline. The amplitude mask Mθ(x,y)M_\theta(x, y) is parameterized via

Mθ(x,y)=σ(α⋅θ(x,y))M_\theta(x, y) = \sigma(\alpha \cdot \theta(x, y))

where θ∈RH×W\theta \in \mathbb{R}^{H \times W}, σ\sigma is the sigmoid, and α\alpha is a temperature parameter annealed during training to enforce binarization.

The end-to-end loss is

L=LAIF+Ldefocus+Lmask\mathcal{L} = \mathcal{L}_{\text{AIF}} + \mathcal{L}_{\text{defocus}} + \mathcal{L}_{\text{mask}}

with composite terms:

  • LAIF\mathcal{L}_{\text{AIF}}: penalizes all-in-focus intensity and gradient error.
  • Ldefocus\mathcal{L}_{\text{defocus}}: penalizes the normalized defocus map error and gradient.
  • Lmask\mathcal{L}_{\text{mask}}: encourages at least 50%50\% mask light throughput.

Typical loss coefficients are (β1,β2,β3,β4,β5)=(1,0.5,1,0.5,103)(\beta_1, \beta_2, \beta_3, \beta_4, \beta_5) = (1, 0.5, 1, 0.5, 10^3).

Training employs a simulated dataset (20,000 scenes from FlyingThings3D, depth range 32–76\,mm), heteroscedastic noise augmentation, and Adam optimization. The mask is learned during the initial $30$ epochs, with hard-thresholding to binary at test time.

3. CADNet Neural Reconstruction Architecture

The reconstruction algorithm, termed CADNet (Editor's term), is a deep neural network featuring a U-Net backbone:

  • Encoder-decoder with four spatial scales: depth-wise doubling and halving of feature channels (32 → 64 → 128 → 256 → 128 → 64 → 32).
  • Residual blocks: Two 3×33\times3 convolutions with ReLU at each scale.
  • Pixel-shuffle downsampler in the encoder; bilinear upsampling in the decoder.
  • Skip connections across encoder and decoder stages.
  • Pyramid Pooling Module (PPM) at the bottleneck for multi-scale feature aggregation and enhanced global context.

The mono variant receives [IL,IR][I_L, I_R] grayscale; the RGB variant receives a 6-channel input [RL,GL,BL,RR,GR,BR][R_L, G_L, B_L, R_R, G_R, B_R]. Outputs are the AIF estimate and normalized defocus map, or (for RGB) three-channel AIF and defocus. The defocus output S^\hat{S} (∈[−1,1]\in [-1, 1]) is mapped back to physical depth via the inverse CoC formula, and AIF output is clamped to [0,1][0,1].

4. Experimental Evaluation

4.1. Simulation and Metric Benchmarks

Experiments on the FlyingThings3D benchmark (aperture f/4f/4) show:

Method Depth MAE (mm) AIF PSNR (dB) SSIM LPIPS
SP 29.37 27.69 0.790 0.427
DP 5.51 29.72 0.832 0.381
CADS 5.15 31.20 0.865 0.337

CADS achieves a >>1.5 dB PSNR gain in AIF over naive DP and 5–6%5–6\% depth MAE improvement over DP.

4.2. Aperture and Baseline Comparisons

CADS consistently outperforms all baselines (standard-pixel, coded SP, DP) across apertures f/4f/4 to f/10f/10. At f/8f/8, for instance, CADS achieves the best depth MAE and AIF PSNR (5.46 mm, 33.49 dB).

Further, CADS outperforms DPDNet, DDDNet, and prior DP methods. For no-code DP captures, CADS achieves 31.20 dB AIF PSNR compared to DPDNet (24.34 dB), and achieves superior Disparity AI(1) values.

4.3. Hardware Prototypes

CADS was physically realized on three platforms:

  • DSLR prototype: Canon 5D Mark IV, 30MP DP sensor, 50 mm f/4f/4 lens, 12.5 mm coded mask. After post-hoc PSF calibration, CADS yields sharper AIF and more accurate depth compared to DP and SP.
  • Endoscope: Storz Rubina scope + DSLR, 2.5 mm coded mask, 800×800 reconstructions, resolves ∼\sim40 μm features.
  • Dermoscope: Pixel 4 + 12× macro optic, 2.5 mm mask, 450×450 reconstructions, sharp 3D skin-lesion AIF and depth.

5. Analysis and Practical Implications

CADS modulates the trade-off between depth estimation (favoring large disparity and large horizontal blur) and all-in-focus image quality (favoring minimal blur) by shaping the DP PSFs via the learned mask. This allows preservation of horizontal disparity cues needed for depth while reducing vertical blur detrimental to AIF reconstruction.

Aperture size determines a fundamental trade-off: larger apertures increase disparity and depth accuracy but worsen blur, while smaller apertures reduce both blur and disparity. CADS shifts the Pareto frontier, improving both metrics at all tested f-numbers, with mask throughput maintained at ≥50%\geq 50\%.

End-to-end training with realistic, heteroscedastic noise achieves robustness despite coded mask light losses, essential for deployment in low-SNR environments.

6. Applications and Future Directions

The ultra-compact, passive, snapshot nature of CADS makes it suitable for:

  • 3D endoscopic guidance and medical imaging
  • Microscopy applications with single-shot depth
  • Dermatological 3D imaging (e.g., EDOF dermoscopy)
  • Autonomous driving and robotics where form factor and single-frame performance are crucial

Future research directions identified include phase-mask CADS variants for increased light efficiency, physics-informed self-supervision for real-scene training without full ground truth, and the integration of CADS principles into mass-market commercial DP sensors (Ghanekar et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Coded-Aperture Dual-Pixel Sensing (CADS).