Coded-Aperture Dual-Pixel Sensing (CADS)
- CADS is an imaging technique that integrates a learnable coded aperture with dual-pixel sensors to capture both color and depth in a single shot.
- It employs an end-to-end optimization pipeline with a U-Net-based CADNet, jointly learning the coded mask and neural reconstruction for improved image fidelity and depth accuracy.
- Experimental evaluations demonstrate that CADS outperforms conventional methods, achieving superior PSNR, depth MAE, and robustness across various optical setups and platforms.
Coded-Aperture Dual-Pixel Sensing (CADS) is an imaging methodology designed for passive, compact, and single-shot acquisition of RGB-D (color + depth) scene information, particularly targeting applications where form factor, speed, and power constraints are paramount. CADS augments the standard dual-pixel (DP) sensor design—a sensor that effectively captures two horizontally offset sub-images per point via pixel-level micro-lens splitting—with a learnable coded aperture mask. Through end-to-end optimization, both the coded mask and a neural reconstruction network are co-adapted to maximize all-in-focus (AIF) image fidelity and depth estimation precision for a broad class of scenes and optical setups (Ghanekar et al., 2024).
1. Optical and Computational Imaging Model
The CADS system introduces a coded amplitude mask at the aperture plane of the imaging lens, modulating the scene radiance at pupil coordinate . The DP sensor splits each pixel into left/right halves, producing two defocus point spread functions (PSFs) and that encode depth via disparity and blur. The continuous-space forward model for the left and right DP sub-images is:
where , model heteroscedastic sensor noise.
In practice, the scene is discretized into depth planes , approximated by spatial per-plane intensity maps . Precomputed PSF stacks , incorporate the coded mask. The discrete forward model is:
The circle-of-confusion (CoC) diameter relating blur and disparity to object depth is
where is lens diameter, focal length, in-focus distance. Normalized disparity is the primary depth cue for CADNet.
2. End-to-End Optimization of Coded Masks and Reconstruction
CADS jointly learns both the coded aperture mask and the neural reconstruction algorithm in a differentiable pipeline. The amplitude mask is parameterized via
where , is the sigmoid, and is a temperature parameter annealed during training to enforce binarization.
The end-to-end loss is
with composite terms:
- : penalizes all-in-focus intensity and gradient error.
- : penalizes the normalized defocus map error and gradient.
- : encourages at least mask light throughput.
Typical loss coefficients are .
Training employs a simulated dataset (20,000 scenes from FlyingThings3D, depth range 32–76\,mm), heteroscedastic noise augmentation, and Adam optimization. The mask is learned during the initial $30$ epochs, with hard-thresholding to binary at test time.
3. CADNet Neural Reconstruction Architecture
The reconstruction algorithm, termed CADNet (Editor's term), is a deep neural network featuring a U-Net backbone:
- Encoder-decoder with four spatial scales: depth-wise doubling and halving of feature channels (32 → 64 → 128 → 256 → 128 → 64 → 32).
- Residual blocks: Two convolutions with ReLU at each scale.
- Pixel-shuffle downsampler in the encoder; bilinear upsampling in the decoder.
- Skip connections across encoder and decoder stages.
- Pyramid Pooling Module (PPM) at the bottleneck for multi-scale feature aggregation and enhanced global context.
The mono variant receives grayscale; the RGB variant receives a 6-channel input . Outputs are the AIF estimate and normalized defocus map, or (for RGB) three-channel AIF and defocus. The defocus output () is mapped back to physical depth via the inverse CoC formula, and AIF output is clamped to .
4. Experimental Evaluation
4.1. Simulation and Metric Benchmarks
Experiments on the FlyingThings3D benchmark (aperture ) show:
| Method | Depth MAE (mm) | AIF PSNR (dB) | SSIM | LPIPS |
|---|---|---|---|---|
| SP | 29.37 | 27.69 | 0.790 | 0.427 |
| DP | 5.51 | 29.72 | 0.832 | 0.381 |
| CADS | 5.15 | 31.20 | 0.865 | 0.337 |
CADS achieves a 1.5 dB PSNR gain in AIF over naive DP and depth MAE improvement over DP.
4.2. Aperture and Baseline Comparisons
CADS consistently outperforms all baselines (standard-pixel, coded SP, DP) across apertures to . At , for instance, CADS achieves the best depth MAE and AIF PSNR (5.46 mm, 33.49 dB).
Further, CADS outperforms DPDNet, DDDNet, and prior DP methods. For no-code DP captures, CADS achieves 31.20 dB AIF PSNR compared to DPDNet (24.34 dB), and achieves superior Disparity AI(1) values.
4.3. Hardware Prototypes
CADS was physically realized on three platforms:
- DSLR prototype: Canon 5D Mark IV, 30MP DP sensor, 50 mm lens, 12.5 mm coded mask. After post-hoc PSF calibration, CADS yields sharper AIF and more accurate depth compared to DP and SP.
- Endoscope: Storz Rubina scope + DSLR, 2.5 mm coded mask, 800×800 reconstructions, resolves 40 μm features.
- Dermoscope: Pixel 4 + 12× macro optic, 2.5 mm mask, 450×450 reconstructions, sharp 3D skin-lesion AIF and depth.
5. Analysis and Practical Implications
CADS modulates the trade-off between depth estimation (favoring large disparity and large horizontal blur) and all-in-focus image quality (favoring minimal blur) by shaping the DP PSFs via the learned mask. This allows preservation of horizontal disparity cues needed for depth while reducing vertical blur detrimental to AIF reconstruction.
Aperture size determines a fundamental trade-off: larger apertures increase disparity and depth accuracy but worsen blur, while smaller apertures reduce both blur and disparity. CADS shifts the Pareto frontier, improving both metrics at all tested f-numbers, with mask throughput maintained at .
End-to-end training with realistic, heteroscedastic noise achieves robustness despite coded mask light losses, essential for deployment in low-SNR environments.
6. Applications and Future Directions
The ultra-compact, passive, snapshot nature of CADS makes it suitable for:
- 3D endoscopic guidance and medical imaging
- Microscopy applications with single-shot depth
- Dermatological 3D imaging (e.g., EDOF dermoscopy)
- Autonomous driving and robotics where form factor and single-frame performance are crucial
Future research directions identified include phase-mask CADS variants for increased light efficiency, physics-informed self-supervision for real-scene training without full ground truth, and the integration of CADS principles into mass-market commercial DP sensors (Ghanekar et al., 2024).