Computed Tomography Neural Operator (CTO)

Updated 20 December 2025

CTO is a unified operator-learning framework that maps measurement data to images by integrating neural operators with the physical principles of CT reconstruction.
It employs architectures like discrete-continuous convolutions, spectral neural operators, and continuous neural fields to achieve cross-resolution and cross-modality generalization.
Empirical benchmarks demonstrate that CTO achieves state-of-the-art speed and accuracy, outperforming traditional CNN and diffusion methods in sparse-view and high-resolution imaging.

A Computed Tomography Neural Operator (CTO) is a unified operator-learning framework that enables data-driven forward and inverse modeling for computed tomography (CT) in the function space, integrating neural operators directly with the underlying physics of CT reconstruction. CTO approaches leverage neural operator architectures—such as discrete-continuous convolutional operators, spectral neural operators, and hybrid function-space models—to achieve cross-resolution, cross-sampling-rate, and cross-modality generalization. This class of models encompasses recent advances in image and wave CT (ultrasound, X-ray, cone-beam), providing state-of-the-art accuracy and rapid inference while retaining direct connections to underlying physical measurement and reconstruction operators (Datta et al., 13 Dec 2025, Zeng et al., 20 Jul 2025, Papa et al., 2023).

1. Mathematical Foundations and Operator Formulations

CTO frameworks formulate both measurement and reconstruction as mappings between infinite-dimensional function spaces. For X-ray CT, the measurement operator is the Radon transform $R$ , mapping an image $x: \Omega \to \mathbb{R}$ to a sinogram $p(\theta, r) = R[x](\theta, r) + \epsilon$ , where $(\theta, r)$ indexes projection angle and detector positions, respectively, and $\epsilon$ models measurement noise. For wave imaging, e.g., ultrasound computed tomography (USCT), the relevant forward map is the solution operator of the Helmholtz equation

$[\nabla^2 + (\frac{\omega}{c(x)})^2] u(x) = -s(x)$

with variable sound speed $c(x)$ and source $s(x)$ . Inverse problems recover $c(x)$ (for USCT) or $x$ (for X-ray CT) from the corresponding observed field or projection data.

Operator-based learning in the CTO paradigm targets the mapping $\mathcal{F}:$ (measurement) $\to$ (property/image), i.e.,

$\text{(Sinogram %%%%11%%%% or field %%%%12%%%%)} \mapsto \text{(Image %%%%13%%%% or property %%%%14%%%%)}$

where all elements are represented as functions (or discretizations thereof), and the operator is learned end-to-end (Datta et al., 13 Dec 2025, Zeng et al., 20 Jul 2025, Dai et al., 2023).

2. Core Architectures and Neural Operator Classes

CTOs employ explicit operator-theoretic inductive biases and architectures suited for function-to-function learning, achieving generalization beyond fixed discretizations. The principal approaches include:

Discrete-Continuous (DISCO) Convolutional Operators: DISCO convolutions define convolution kernels $\kappa(z)$ in continuous space, parameterized via a linear basis (e.g., isotropic/anisotropic rings), ensuring discretization-agnosticity and compatibility across irregular and continuous grids. DISCO layers support rotation-equivariant design and scale gracefully with grid resolution (Datta et al., 13 Dec 2025).
Sinogram- and Image-Domain Operator Staging: CTOs operate sequentially in the sinogram and image domains. The pipeline applies a sinogram-space neural operator (NO $_\text{s}$ ) with rotation-equivariant and frequency-processing branches, followed by a sequence of image-space operator cascades (NO $_\text{i}$ ) integrated with data-consistency steps reflecting the physical forward model (Datta et al., 13 Dec 2025).
Spectral Neural Operators: Fourier Neural Operators (FNO), Born-series–augmented FNO (BFNO), Multigrid Neural Operator (MgNO), and Adapative FNO (AFNO) replace spatial convolutions with spectral multipliers, incorporate multi-scale or iterative-scattering inductive biases, and enable high-accuracy approximation of the underlying partial differential equation (PDE) solution maps in USCT (Zeng et al., 20 Jul 2025).
Continuous Neural Fields with Modulation: For cone-beam CT and sparse-view CT, the density or attenuation field $\mu(x)$ is parameterized as a continuous neural field, modulated by patient-specific or sample-specific learned modulation fields to achieve both anatomical prior sharing and case-specific adaptation (Papa et al., 2023).

3. Training Regimes, Datasets, and Optimization

CTO models are supervised using paired measurement and ground-truth image/property samples. Training leverages large-scale, high-fidelity, and anatomically realistic synthetic datasets—for example, OpenBreastUS for USCT (8,000 breast phantoms; $>16$ million field solutions), KiTS19 and AAPM for X-ray CT. For ultrasound and wave imaging, ground-truth solutions are typically computed by convergent Born series (CBS) PDE solvers, ensuring that the neural operator surrogate closely approximates accurate physical forward models (Zeng et al., 20 Jul 2025, Zeng et al., 2023).

Loss functions include relative $L^2$ or normalized mean-squared error (MSE) in the forward domain, and direct MSE or $L_1$ -loss (optionally with SSIM regularization) in the inverse or reconstruction domain. All operators are implemented in modern deep-learning frameworks (PyTorch), optimized with AdamW or Adam optimizers. Batch sizes and spectral/angular resolution are set to match application-specific inference and memory constraints (Zeng et al., 20 Jul 2025, Datta et al., 13 Dec 2025).

4. Performance, Generalization, and Benchmarks

Empirical performance is established on multiple metrics—root mean-squared error (RMSE), PSNR, and SSIM—for both forward and inverse tasks, and at varying levels of angular/projective sampling and spatial resolution. CTOs achieve:

Multi-sampling-rate and cross-resolution generalization: CTOs deploy at arbitrary sampling and image resolutions not seen in training, with zero-shot super-resolution demonstrating $>3$ dB PSNR gain over CNN baselines.
High-accuracy reconstructions under severe undersampling: For 18-view sparse CT, CTO attains RMSE $53.5$ HU and PSNR $36.11$ dB, substantially outperforming CNNs (RMSE $82.5$ HU, PSNR $32.66$ dB) and diffusion methods (RMSE $81.7$ HU, PSNR $32.50$ dB) (Datta et al., 13 Dec 2025).
Ultra-fast inference: CTO and spectral neural operator models yield typical inference times of $0.013$–$0.065$ seconds per image (GPU), versus $>30$ seconds for diffusion approaches and $>300$ seconds for classical PDE-based solvers (Zeng et al., 20 Jul 2025, Datta et al., 13 Dec 2025).
Robust anatomical generalization: Hybrid physics-augmented CTOs maintain high-fidelity inverse reconstructions on real in vivo breast USCT as well as simulated cases, outperforming direct encoder–decoder or trunk–branch neural inverse operators (Zeng et al., 20 Jul 2025).
For neural Born series–based operators (NBSO), forward-modeling error (RRMSE) is 20.7% for breast and 19.0% for brain phantoms, with FWI reconstructions PSNR up to $27.8$ dB (NBSO-FWI) vs $33.1$ dB for CBS-FWI (Zeng et al., 2023).

Table: Representative CTO Benchmark Results

Model / Task	Metric	CTO	CNN	Diffusion	CBS / FDFD
USCT FWI (Hybrid)	PSNR	30.5	20–26	—	33.1
Sparse CT (18 views)	PSNR	36.1	32.7	32.5	—
Inference Time (s/img)	Time	0.065	0.417	$\sim$ 51.6	$>300$

A plausible implication is that CTOs invert far fewer samples and operate at higher resolution than prior neural adversarial or CNN-based models, due to both operator-theoretic inductive bias and function-space parametrization (Datta et al., 13 Dec 2025, Zeng et al., 20 Jul 2025, Zeng et al., 2023).

5. Physical Integration and Physics Constraints

Key CTO designs integrate the physical forward operator and its adjoint directly into the learning pipeline. For example, unrolled CTO variants replace standard CNN regularizations in variational iterative schemes with operator-based updates that explicitly leverage the measurement operator $A$ and its adjoint $A^*$ . Data-consistency updates, physics-consistent loss design, and rotational-equivariant convolutions in sinogram space collectively enforce physically meaningful mappings, ensuring robustness to acquisition variation and, crucially, to object orientation (Datta et al., 13 Dec 2025). In the NBSO architecture, the structure of the Born series and Lippmann–Schwinger equation is embedded in the network by mimicking iterations of $\mathcal{M}$ (CBS operator), ensuring convergence and stability even in the presence of strong scattering (Zeng et al., 2023).

6. Modalities, Extensions, and Limitations

CTOs have been instantiated and validated for various tomographic paradigms:

X-ray CT: Radon transform and filtered backprojection operators are used; CTO enables joint sinogram-image operator learning and outperforms both discrete CNNs and diffusion models across anatomical sites, sampling rates, and resolutions (Datta et al., 13 Dec 2025).
Ultrasound CT / USCT: Wave equation solvers (Helmholtz, Born series) are learned directly; neural surrogate operators offer 4–5 orders of magnitude speedup for forward/adjoint FWI, facilitating near real-time inversion (Zeng et al., 20 Jul 2025, Zeng et al., 2023).
Cone-beam CT (CBCT): Continuous neural fields and patient-specific neural modulation fields condition the operator, enhancing soft-tissue recovery under low-dose, sparse-projection regimes (Papa et al., 2023).
Prospective extension: CTOs can be adapted to other forward models (e.g., Radon transform for optical or electromagnetic tomography, elastic wave forward models for non-destructive evaluation), maintaining the correction-based operator structure (Zeng et al., 2023).

Limitations include:

Current demonstrations are 2D; extension to 3D and limited-angle/heterogeneous modalities requires further architectural and computational refinement (Datta et al., 13 Dec 2025, Zeng et al., 20 Jul 2025).
For USCT, current datasets do not include measurement noise or heterogeneous loss, and generalization to real-world 3D in vivo scenarios remains under investigation.
End-to-end inverse networks (e.g., InversionNet) may struggle with generalization and do not consistently surpass hybrid physics-augmented CTOs in accuracy or robustness (Zeng et al., 20 Jul 2025).

7. Design Validations, Ablations, and Future Directions

Ablations validate that CTO’s core features—rotation-equivariant DISCO convolutions, frequency-branch processing in sinogram space, and function-space–parametrized kernels—are crucial for resolution-agnostic generalization and state-of-the-art accuracy. Eliminating frequency-branch or equivariant features degrades PSNR by $0.8$–$1$dB under typical sparse-view regimes (Datta et al., 13 Dec 2025).

Future directions encompass:

Physics-informed regularization (e.g., total-variation, anatomical-prior embedding) to further improve noise robustness;
Compression and quantization for deployment on accelerated hardware (TPU, GPU tensor cores);
Expanding function-space operator learning to incorporate attenuation, scattering, and shear effects necessary for practical clinical imaging (Datta et al., 13 Dec 2025, Zeng et al., 20 Jul 2025, Zeng et al., 2023).

CTOs thus constitute a general, operator-theoretic, and empirically validated framework for high-resolution, physically-consistent, and computationally efficient CT reconstruction across imaging modalities.