Coded-Aperture Dual-Pixel Sensing (CADS)

Updated 10 March 2026

CADS is an imaging technique that integrates a learnable coded aperture with dual-pixel sensors to capture both color and depth in a single shot.
It employs an end-to-end optimization pipeline with a U-Net-based CADNet, jointly learning the coded mask and neural reconstruction for improved image fidelity and depth accuracy.
Experimental evaluations demonstrate that CADS outperforms conventional methods, achieving superior PSNR, depth MAE, and robustness across various optical setups and platforms.

Coded-Aperture Dual-Pixel Sensing (CADS) is an imaging methodology designed for passive, compact, and single-shot acquisition of RGB-D (color + depth) scene information, particularly targeting applications where form factor, speed, and power constraints are paramount. CADS augments the standard dual-pixel (DP) sensor design—a sensor that effectively captures two horizontally offset sub-images per point via pixel-level micro-lens splitting—with a learnable coded aperture mask. Through end-to-end optimization, both the coded mask and a neural reconstruction network are co-adapted to maximize all-in-focus (AIF) image fidelity and depth estimation precision for a broad class of scenes and optical setups (Ghanekar et al., 2024).

1. Optical and Computational Imaging Model

The CADS system introduces a coded amplitude mask $c(u)$ at the aperture plane of the imaging lens, modulating the scene radiance $f(u)$ at pupil coordinate $u$ . The DP sensor splits each pixel into left/right halves, producing two defocus point spread functions (PSFs) $H^{L}(x, u; z)$ and $H^{R}(x, u; z)$ that encode depth $z$ via disparity and blur. The continuous-space forward model for the left and right DP sub-images is:

$y_{L}(x) = \int H^{L}(x, u; z) c(u) f(u) \, du + n_{L}(x)$

$y_{R}(x) = \int H^{R}(x, u; z) c(u) f(u) \, du + n_{R}(x)$

where $n_L$ , $n_R$ model heteroscedastic sensor noise.

In practice, the scene is discretized into $K$ depth planes $\{z_k\}$ , approximated by spatial per-plane intensity maps $s_{z_k}(x)$ . Precomputed PSF stacks $h_{z_k}^{L, C}(x)$ , $h_{z_k}^{R, C}(x)$ incorporate the coded mask. The discrete forward model is:

$I_L(x) = \sum_{k=1}^K \left[h^{L, C}_{z_k} * s_{z_k}\right](x) + n_L(x)$

$I_R(x) = \sum_{k=1}^K \left[h^{R, C}_{z_k} * s_{z_k}\right](x) + n_R(x)$

The circle-of-confusion (CoC) diameter relating blur and disparity to object depth is

$D(z) = \frac{L f}{1 - f/g} \left(\frac{1}{g} - \frac{1}{z}\right)$

where $L$ is lens diameter, $f$ focal length, $g$ in-focus distance. Normalized disparity $d(z) \simeq D(z)/p$ is the primary depth cue for CADNet.

2. End-to-End Optimization of Coded Masks and Reconstruction

CADS jointly learns both the coded aperture mask and the neural reconstruction algorithm in a differentiable pipeline. The amplitude mask $M_\theta(x, y)$ is parameterized via

$M_\theta(x, y) = \sigma(\alpha \cdot \theta(x, y))$

where $\theta \in \mathbb{R}^{H \times W}$ , $\sigma$ is the sigmoid, and $\alpha$ is a temperature parameter annealed during training to enforce binarization.

The end-to-end loss is

$\mathcal{L} = \mathcal{L}_{\text{AIF}} + \mathcal{L}_{\text{defocus}} + \mathcal{L}_{\text{mask}}$

with composite terms:

$\mathcal{L}_{\text{AIF}}$ : penalizes all-in-focus intensity and gradient error.
$\mathcal{L}_{\text{defocus}}$ : penalizes the normalized defocus map error and gradient.
$\mathcal{L}_{\text{mask}}$ : encourages at least $50\%$ mask light throughput.

Typical loss coefficients are $(\beta_1, \beta_2, \beta_3, \beta_4, \beta_5) = (1, 0.5, 1, 0.5, 10^3)$ .

Training employs a simulated dataset (20,000 scenes from FlyingThings3D, depth range 32–76\,mm), heteroscedastic noise augmentation, and Adam optimization. The mask is learned during the initial $30$ epochs, with hard-thresholding to binary at test time.

3. CADNet Neural Reconstruction Architecture

The reconstruction algorithm, termed CADNet (Editor's term), is a deep neural network featuring a U-Net backbone:

Encoder-decoder with four spatial scales: depth-wise doubling and halving of feature channels (32 → 64 → 128 → 256 → 128 → 64 → 32).
Residual blocks: Two $3\times3$ convolutions with ReLU at each scale.
Pixel-shuffle downsampler in the encoder; bilinear upsampling in the decoder.
Skip connections across encoder and decoder stages.
Pyramid Pooling Module (PPM) at the bottleneck for multi-scale feature aggregation and enhanced global context.

The mono variant receives $[I_L, I_R]$ grayscale; the RGB variant receives a 6-channel input $[R_L, G_L, B_L, R_R, G_R, B_R]$ . Outputs are the AIF estimate and normalized defocus map, or (for RGB) three-channel AIF and defocus. The defocus output $\hat{S}$ ( $\in [-1, 1]$ ) is mapped back to physical depth via the inverse CoC formula, and AIF output is clamped to $[0,1]$ .

4. Experimental Evaluation

4.1. Simulation and Metric Benchmarks

Experiments on the FlyingThings3D benchmark (aperture $f/4$ ) show:

Method	Depth MAE (mm)	AIF PSNR (dB)	SSIM	LPIPS
SP	29.37	27.69	0.790	0.427
DP	5.51	29.72	0.832	0.381
CADS	5.15	31.20	0.865	0.337

CADS achieves a $>$ 1.5 dB PSNR gain in AIF over naive DP and $5–6\%$ depth MAE improvement over DP.

4.2. Aperture and Baseline Comparisons

CADS consistently outperforms all baselines (standard-pixel, coded SP, DP) across apertures $f/4$ to $f/10$ . At $f/8$ , for instance, CADS achieves the best depth MAE and AIF PSNR (5.46 mm, 33.49 dB).

Further, CADS outperforms DPDNet, DDDNet, and prior DP methods. For no-code DP captures, CADS achieves 31.20 dB AIF PSNR compared to DPDNet (24.34 dB), and achieves superior Disparity AI(1) values.

4.3. Hardware Prototypes

CADS was physically realized on three platforms:

DSLR prototype: Canon 5D Mark IV, 30MP DP sensor, 50 mm $f/4$ lens, 12.5 mm coded mask. After post-hoc PSF calibration, CADS yields sharper AIF and more accurate depth compared to DP and SP.
Endoscope: Storz Rubina scope + DSLR, 2.5 mm coded mask, 800×800 reconstructions, resolves $\sim$ 40 μm features.
Dermoscope: Pixel 4 + 12× macro optic, 2.5 mm mask, 450×450 reconstructions, sharp 3D skin-lesion AIF and depth.

5. Analysis and Practical Implications

CADS modulates the trade-off between depth estimation (favoring large disparity and large horizontal blur) and all-in-focus image quality (favoring minimal blur) by shaping the DP PSFs via the learned mask. This allows preservation of horizontal disparity cues needed for depth while reducing vertical blur detrimental to AIF reconstruction.

Aperture size determines a fundamental trade-off: larger apertures increase disparity and depth accuracy but worsen blur, while smaller apertures reduce both blur and disparity. CADS shifts the Pareto frontier, improving both metrics at all tested f-numbers, with mask throughput maintained at $\geq 50\%$ .

End-to-end training with realistic, heteroscedastic noise achieves robustness despite coded mask light losses, essential for deployment in low-SNR environments.

6. Applications and Future Directions

The ultra-compact, passive, snapshot nature of CADS makes it suitable for:

3D endoscopic guidance and medical imaging
Microscopy applications with single-shot depth
Dermatological 3D imaging (e.g., EDOF dermoscopy)
Autonomous driving and robotics where form factor and single-frame performance are crucial

Future research directions identified include phase-mask CADS variants for increased light efficiency, physics-informed self-supervision for real-scene training without full ground truth, and the integration of CADS principles into mass-market commercial DP sensors (Ghanekar et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

Passive Snapshot Coded Aperture Dual-Pixel RGB-D Imaging (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Coded-Aperture Dual-Pixel Sensing (CADS).

Coded-Aperture Dual-Pixel Sensing (CADS)

1. Optical and Computational Imaging Model

2. End-to-End Optimization of Coded Masks and Reconstruction

3. CADNet Neural Reconstruction Architecture

4. Experimental Evaluation

4.1. Simulation and Metric Benchmarks

4.2. Aperture and Baseline Comparisons

4.3. Hardware Prototypes

5. Analysis and Practical Implications

6. Applications and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Coded-Aperture Dual-Pixel Sensing (CADS)

1. Optical and Computational Imaging Model

2. End-to-End Optimization of Coded Masks and Reconstruction

3. CADNet Neural Reconstruction Architecture

4. Experimental Evaluation

4.1. Simulation and Metric Benchmarks

4.2. Aperture and Baseline Comparisons

4.3. Hardware Prototypes

5. Analysis and Practical Implications

6. Applications and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research