S-CNN: Supervised CNN for Phase Retrieval
- Supervised Convolutional Neural Network (S-CNN) is a deep learning approach that maps 3D diffraction intensities to reciprocal-space phase for robust phase retrieval in BCDI.
- It uses a 3D U-Net–style encoder–decoder with dilated convolutions and skip connections, complemented by a symmetry-robust weighted coherent average loss function.
- The method significantly accelerates phase reconstruction compared to traditional iterative algorithms, achieving high-fidelity real-space images even under extreme strain.
A Supervised Convolutional Neural Network (S-CNN) for phase retrieval in Bragg Coherent Diffraction Imaging (BCDI) is a 3D U-Net–style encoder–decoder architecture trained to directly predict the reciprocal-space phase from simulated diffraction intensities. Standard iterative phase-retrieval algorithms exhibit limitations in reconstructing highly strained crystals due to convergence issues and low @@@@1@@@@ under extreme strain. The S-CNN approach leverages the structural properties of input and output representations within reciprocal space, addressing computational obstacles and expanding the applicability of BCDI to highly strained particles. The method maps the measured diffraction intensities to the phase via supervised learning, enabling rapid and accurate phase reconstruction and subsequent real-space object recovery via an inverse Fourier transform, followed by physics-consistent iterative refinement (Masto et al., 9 Jul 2025).
1. Network Architecture and Input Preprocessing
The foundation of the S-CNN is a fully 3D U-Net–style encoder–decoder comprising approximately 143 million parameters. Input data consists of batches of 3D diffraction intensity volumes with shape , processed through logarithmic scaling and min–max normalization to . The encoder path contains four down-sampling stages; each stage executes two parallel convolutions with dilation rates and ReLU activations, followed by max pooling (stride 2). The bottleneck consists of two dilated convolutions (with dilations $1, 2, 4$) without pooling.
Decoding reverses the process with four up-sampling stages, each applying a transposed convolution (stride 2), concatenation (skip connection) with the corresponding encoder feature map, and two convolutional + ReLU layers. The output head is a convolution without bounding activation, mapping to a single phase volume that enables arbitrary unwrapped phase predictions.
2. Mathematical Formulation of Phase-Prediction Mapping
Let denote reciprocal-space voxel coordinates. The BCDI measurement is defined by the squared magnitude of the 3D Fourier transform:
where (modulus) and (phase) describe the real-space object. The S-CNN is trained to approximate the inverse mapping:
where denotes the reciprocal-space phase. After S-CNN prediction , the complex diffracted amplitude is reconstructed:
The real-space reconstruction is then obtained via a single inverse Fourier transform:
3. Loss Function and Symmetry Handling
The phase retrieval problem involves symmetry ambiguities: phase functions are only defined up to overall constant, ramp, wrap, and sign symmetries. The Weighted Coherent Average (WCA) loss is introduced for training, addressing these symmetries explicitly. For voxels, , input intensity (log-scaled), ground-truth phase , and predicted phase :
The WCA loss automatically accommodates wrap ( periodicity, via the complex exponential) and sign ambiguity, and further weights phase mismatch by the intensity signal to focus optimization on high-signal regions.
4. Training Data Generation Methodology
The training dataset encompasses realistic particle geometries and physically meaningful strain fields. Crystal shapes are drawn from Wulff, Winterbottom, and "random" planar-cut configurations. Strain fields are constructed via discrete phase distributions from two random Gaussians, two random cosines, or a Gaussian random field, with the total phase range inside each particle spanning .
For each shape-strain configuration, multiple 3D diffraction patterns are simulated by rotating the crystal and varying per-axis oversampling (all Nyquist). The forward scattering model is GPU-accelerated (PyNX), targeting single-Bragg-peak diffraction on a grid with Poisson noise overlay. Data splits consist of patterns for training, for validation, and for testing.
5. Optimization Protocol and Training Hyperparameters
Model optimization is conducted in TensorFlow, leveraging two NVIDIA Tesla V100 GPUs. The Adam optimizer applies a fixed learning rate . Batches contain 16 volumes; training is run for 60 epochs (roughly 30 wall-clock hours). There is no explicit learning rate decay, weight decay, or early stopping beyond validation-set monitoring.
6. Quantitative Evaluation and Benchmarking Against Iterative Algorithms
On the held-out simulated test set, the S-CNN achieves near-perfect phase prediction (WCA loss ). Real-space reconstructions via inverse FFT demonstrate subpixel agreement in both modulus and phase compared to ground truth (visual assessment, no explicit RMSE reported). For experimental diffraction data, conventional phase-retrieval protocols ($400$ HIO + RAAR + $300$ ER, $60$ random seeds) fail with incomplete or overshrunk support reconstructions. By contrast, the S-CNN pipeline followed by $400$ ER steps (support-constrained, boundary updated) generates high-fidelity solutions.
The S-CNN method achieves dramatic speed-up: CNN inference plus inverse FFT requires less than , ER-refinement on a volume $6$–, while standard iterative approaches may require tens of minutes for many independent runs. This acceleration (factor –) is accompanied by robustness to high-strain conditions, where conventional PR algorithms consistently fail.
7. Output Post-processing and Physics-based Consistency
Upon prediction of reciprocal-space phase , the reconstructed amplitude is computed with the measured intensity, and the real-space image is generated by a single 3D inverse FFT. To remove minor artifacts and enforce physical consistency, a short error-reduction loop ( iterations) in PyNX is employed, imposing measured amplitude in reciprocal space and a tight support in real space, with updates restricted to boundary voxels only. This maintains support integrity and yields a refined solution that is consistent with diffraction and object constraints.
In summary, the supervised 3D U-Net approach enables direct phase regression on log-scaled coherent diffraction volumes, with custom symmetry-robust loss handling and large-scale simulation for realistic training coverage. This method substantially outperforms and accelerates standard iterative phase-retrieval recipes, especially for highly strained crystalline samples, and provides a practical pipeline for rapid, robust BCDI analysis (Masto et al., 9 Jul 2025).