Papers
Topics
Authors
Recent
2000 character limit reached

S-CNN: Supervised CNN for Phase Retrieval

Updated 8 January 2026
  • Supervised Convolutional Neural Network (S-CNN) is a deep learning approach that maps 3D diffraction intensities to reciprocal-space phase for robust phase retrieval in BCDI.
  • It uses a 3D U-Net–style encoder–decoder with dilated convolutions and skip connections, complemented by a symmetry-robust weighted coherent average loss function.
  • The method significantly accelerates phase reconstruction compared to traditional iterative algorithms, achieving high-fidelity real-space images even under extreme strain.

A Supervised Convolutional Neural Network (S-CNN) for phase retrieval in Bragg Coherent Diffraction Imaging (BCDI) is a 3D U-Net–style encoder–decoder architecture trained to directly predict the reciprocal-space phase from simulated diffraction intensities. Standard iterative phase-retrieval algorithms exhibit limitations in reconstructing highly strained crystals due to convergence issues and low @@@@1@@@@ under extreme strain. The S-CNN approach leverages the structural properties of input and output representations within reciprocal space, addressing computational obstacles and expanding the applicability of BCDI to highly strained particles. The method maps the measured diffraction intensities to the phase via supervised learning, enabling rapid and accurate phase reconstruction and subsequent real-space object recovery via an inverse Fourier transform, followed by physics-consistent iterative refinement (Masto et al., 9 Jul 2025).

1. Network Architecture and Input Preprocessing

The foundation of the S-CNN is a fully 3D U-Net–style encoder–decoder comprising approximately 143 million parameters. Input data consists of batches of 3D diffraction intensity volumes with shape (B,64,64,64,1)(B, 64, 64, 64, 1), processed through logarithmic scaling I=log(1+I)I' = \log(1 + I) and min–max normalization to [0,1][0, 1]. The encoder path contains four down-sampling stages; each stage executes two parallel 3×3×33 \times 3 \times 3 convolutions with dilation rates {1,2,4}\{1, 2, 4\} and ReLU activations, followed by 2×2×22 \times 2 \times 2 max pooling (stride 2). The bottleneck consists of two dilated 3×3×33 \times 3 \times 3 convolutions (with dilations $1, 2, 4$) without pooling.

Decoding reverses the process with four up-sampling stages, each applying a 2×2×22 \times 2 \times 2 transposed convolution (stride 2), concatenation (skip connection) with the corresponding encoder feature map, and two 3×3×33 \times 3 \times 3 convolutional + ReLU layers. The output head is a 1×1×11 \times 1 \times 1 convolution without bounding activation, mapping to a single 64364^3 phase volume that enables arbitrary unwrapped phase predictions.

2. Mathematical Formulation of Phase-Prediction Mapping

Let qR3q \in \mathbb{R}^3 denote reciprocal-space voxel coordinates. The BCDI measurement is defined by the squared magnitude of the 3D Fourier transform:

I(q)=F{ρ(x)eiϕ(x)}2I(q) = | F\{ \rho(x) e^{i\phi(x)} \} |^2

where ρ(x)\rho(x) (modulus) and ϕ(x)\phi(x) (phase) describe the real-space object. The S-CNN is trained to approximate the inverse mapping:

Φ:I(q)ϕ(q)\Phi : I(q) \mapsto \phi(q)

where ϕ(q)\phi(q) denotes the reciprocal-space phase. After S-CNN prediction ϕ^(q)=Φ(I(q))\hat{\phi}(q) = \Phi(I(q)), the complex diffracted amplitude is reconstructed:

A(q)=I(q) eiϕ^(q)A(q) = \sqrt{I(q)} \ e^{i\hat{\phi}(q)}

The real-space reconstruction is then obtained via a single inverse Fourier transform:

Ψ(x)=F1{A(q)}=F1{I(q) eiϕ^(q)}\Psi(x) = F^{-1}\{A(q)\} = F^{-1}\{ \sqrt{I(q)} \ e^{i\hat{\phi}(q)} \}

3. Loss Function and Symmetry Handling

The phase retrieval problem involves symmetry ambiguities: phase functions are only defined up to overall constant, ramp, wrap, and sign symmetries. The Weighted Coherent Average (WCA) loss is introduced for training, addressing these symmetries explicitly. For N=643N = 64^3 voxels, k=1Nk = 1 \ldots N, input intensity Iin,kI_{in,k} (log-scaled), ground-truth phase ϕGT,k\phi_{GT,k}, and predicted phase ϕpred,k\phi_{pred,k}: L+=11Nk=1NIin,kexp[i(ϕGT,kϕpred,k)]L_+ = 1 - \left| \frac{1}{N} \sum_{k=1}^N I_{in,k} \exp[i(\phi_{GT,k} - \phi_{pred,k})] \right|

L=11Nk=1NIin,kexp[i(ϕGT,kϕpred,k)]L_- = 1 - \left| \frac{1}{N} \sum_{k=1}^N I_{in,k} \exp[i(-\phi_{GT,k} - \phi_{pred,k})] \right|

LWCA=min(L+,L)L_{WCA} = \min(L_+, L_-)

The WCA loss automatically accommodates wrap (2π2\pi periodicity, via the complex exponential) and sign ambiguity, and further weights phase mismatch by the intensity signal to focus optimization on high-signal regions.

4. Training Data Generation Methodology

The training dataset encompasses realistic particle geometries and physically meaningful strain fields. Crystal shapes are drawn from Wulff, Winterbottom, and "random" planar-cut configurations. Strain fields are constructed via discrete phase distributions from two random Gaussians, two random cosines, or a Gaussian random field, with the total phase range inside each particle spanning [2π,5π][2\pi, 5\pi].

For each shape-strain configuration, multiple 3D diffraction patterns are simulated by rotating the crystal and varying per-axis oversampling (all >2×\gt 2 \times Nyquist). The forward scattering model is GPU-accelerated (PyNX), targeting single-Bragg-peak diffraction on a 64364^3 grid with Poisson noise overlay. Data splits consist of 95,00095{,}000 patterns for training, 4,0004{,}000 for validation, and 3,0003{,}000 for testing.

5. Optimization Protocol and Training Hyperparameters

Model optimization is conducted in TensorFlow, leveraging two NVIDIA Tesla V100 GPUs. The Adam optimizer applies a fixed learning rate lr=104\text{lr}=10^{-4}. Batches contain 16 volumes; training is run for 60 epochs (roughly 30 wall-clock hours). There is no explicit learning rate decay, weight decay, or early stopping beyond validation-set monitoring.

6. Quantitative Evaluation and Benchmarking Against Iterative Algorithms

On the held-out simulated test set, the S-CNN achieves near-perfect phase prediction (WCA loss <103< 10^{-3}). Real-space reconstructions via inverse FFT demonstrate subpixel agreement in both modulus ρ(x)\rho(x) and phase ϕ(x)\phi(x) compared to ground truth (visual assessment, no explicit RMSE reported). For experimental diffraction data, conventional phase-retrieval protocols ($400$ HIO + 1,0001{,}000 RAAR + $300$ ER, $60$ random seeds) fail with incomplete or overshrunk support reconstructions. By contrast, the S-CNN pipeline followed by $400$ ER steps (support-constrained, boundary updated) generates high-fidelity solutions.

The S-CNN method achieves dramatic speed-up: CNN inference plus inverse FFT requires less than 10 ms10\ \mathrm{ms}, ER-refinement on a 64364^3 volume $6$–10 s10\ \mathrm{s}, while standard iterative approaches may require tens of minutes for many independent runs. This acceleration (factor 10210^210310^3) is accompanied by robustness to high-strain conditions, where conventional PR algorithms consistently fail.

7. Output Post-processing and Physics-based Consistency

Upon prediction of reciprocal-space phase ϕ^(q)\hat{\phi}(q), the reconstructed amplitude A(q)A(q) is computed with the measured intensity, and the real-space image ΨCNN(x)\Psi_{CNN}(x) is generated by a single 3D inverse FFT. To remove minor artifacts and enforce physical consistency, a short error-reduction loop (400\approx 400 iterations) in PyNX is employed, imposing measured amplitude in reciprocal space and a tight support in real space, with updates restricted to boundary voxels only. This maintains support integrity and yields a refined solution Ψrefined(x)\Psi_{refined}(x) that is consistent with diffraction and object constraints.


In summary, the supervised 3D U-Net approach enables direct phase regression on log-scaled coherent diffraction volumes, with custom symmetry-robust loss handling and large-scale simulation for realistic training coverage. This method substantially outperforms and accelerates standard iterative phase-retrieval recipes, especially for highly strained crystalline samples, and provides a practical pipeline for rapid, robust BCDI analysis (Masto et al., 9 Jul 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Supervised Convolutional Neural Network (S-CNN).