Joint Reconstruction Model: Coupled Inference

Updated 4 July 2026

Joint Reconstruction Model (JRM) is a coupled inverse formulation that simultaneously reconstructs correlated variables, enhancing fidelity and robustness.
It leverages shared priors such as common edges, latent factors, and motion fields to improve data efficiency compared to independent reconstruction methods.
Practical applications of JRM include accelerated MRI, PET–MRI co-reconstruction, dynamic CT, and emerging 3D object reconstruction, often showing significant quantitative gains.

Searching arXiv for papers using the term "Joint Reconstruction Model" and closely related variants. Searching arXiv for "Joint Reconstruction Model" exact phrase. Joint Reconstruction Model (JRM) denotes a class of coupled inverse formulations in which reconstruction is performed jointly over multiple variables rather than by a sequential pipeline. In the arXiv literature, the acronym is used for several technically distinct models: optimized MRI sampling and reconstruction, PET–MRI co-reconstruction, motion-corrected CBCT, dynamic inverse problems with low-rank structure, multi-view and multi-energy CT reconstruction, joint reconstruction with registration, super-resolution, or segmentation, and, more recently, object-centric 3D reconstruction without alignment (Aggarwal et al., 2019, Xie et al., 2024, Paepe et al., 18 Apr 2025, Arridge et al., 2020, Toivanen et al., 2019, Wu et al., 27 Mar 2026). The common premise is that jointly estimated variables share structure—edges, latent factors, priors, motion fields, or repeated object identity—and that explicit coupling can improve fidelity, robustness, or data efficiency relative to independent reconstruction.

1. Scope and nomenclature

The term is not attached to a single canonical model. Instead, it appears as a recurring designation for formulations that reconstruct correlated images, modalities, views, channels, or object instances under a shared constraint set or shared prior. This includes multichannel MRI with jointly learned sampling and reconstruction, PET–MRI joint priors, joint reconstruction and motion estimation, joint reconstruction and low-rank decomposition, and joint reconstruction from repeated but unaligned object observations (Aggarwal et al., 2019, Xie et al., 2024, Paepe et al., 18 Apr 2025, Arridge et al., 2020, Wu et al., 27 Mar 2026).

Usage	Jointly estimated entities	Representative papers
Accelerated MRI	sampling pattern, image, network parameters, or multi-contrast edges	(Aggarwal et al., 2019, Chen et al., 2017, Mani et al., 2020, Bian et al., 2022)
PET–MRI	PET image and MRI image under shared structural or generative priors	(Xie et al., 2024, Xie et al., 2023, Choi et al., 2017, Rasch et al., 2017)
Motion/dynamics	image with motion, registration, or low-rank temporal factors	(Paepe et al., 18 Apr 2025, Corona et al., 2019, Chen et al., 2018, Arridge et al., 2020)
CT and multi-view imaging	multi-energy channels, segmented reconstructions, or correlated views	(Toivanen et al., 2019, Storath et al., 2014, Thirumalai et al., 2012)
3D object reconstruction	repeated unaligned object instances with shared subject identity	(Wu et al., 27 Mar 2026)

A closely related acronym also appears in CO $_2$ -plume monitoring as the Joint Recovery Method, where multiple surveys share a common background model and survey-specific plume differences; its probabilistic extension, pJRM, adds posterior uncertainty estimation (Deng et al., 30 Jan 2025). This suggests that the central idea of JRM is broader than any specific modality: shared latent structure is promoted directly at reconstruction time.

2. Core coupling mechanisms

The coupling in JRM appears in several mathematically distinct forms. In Bayesian PET–MRI reconstruction, the posterior is written as

$p(u,v|f,g)\propto p(f|u)\,p(g|v)\,p(u,v),$

so that dependence is carried by a learned joint prior $p(u,v)$ rather than by hand-crafted edge penalties (Xie et al., 2024). In multi-contrast MRI, Chen, Fang, and Ye reformulate vectorial total variation by reconstructing the Jacobian $v(x)=Du(x)$ directly and solving

$\min_v \alpha\|v\|_{L^1_*}+H(v),$

which encodes the empirical fact that anatomical structures across contrasts share the same edges (Chen et al., 2017).

Other JRMs couple variables through decomposition. In time-lapse seismic monitoring, each survey image is written as $x_i=m+\delta x_i$ , where $m$ is a common background and $\delta x_i$ is a sparse innovation; the coupling arises because all surveys share the same $m$ (Deng et al., 30 Jan 2025). In dynamic inverse problems, the unknown movie $X$ is constrained to admit a nonnegative low-rank factorization $p(u,v|f,g)\propto p(f|u)\,p(g|v)\,p(u,v),$ 0, jointly estimating spatial bases $p(u,v|f,g)\propto p(f|u)\,p(g|v)\,p(u,v),$ 1 and temporal coefficients $p(u,v|f,g)\propto p(f|u)\,p(g|v)\,p(u,v),$ 2 from undersampled data (Arridge et al., 2020). In optimized MRI acquisition, J-MoDL couples the sampling pattern $p(u,v|f,g)\propto p(f|u)\,p(g|v)\,p(u,v),$ 3 and CNN parameters $p(u,v|f,g)\propto p(f|u)\,p(g|v)\,p(u,v),$ 4 through the reconstruction map $p(u,v|f,g)\propto p(f|u)\,p(g|v)\,p(u,v),$ 5, and trains both simultaneously (Aggarwal et al., 2019).

A plausible implication is that JRM is best viewed as a coupling strategy rather than a model family with fixed algebraic form. The shared object of estimation may be a prior, a background, an edge set, a deformation, a factorization, a sampling design, or a latent subject identity.

3. Variational and model-based formulations

Classical JRMs are predominantly variational. In joint edge reconstruction for multi-contrast MRI, the original vectorial-TV problem

$p(u,v|f,g)\propto p(f|u)\,p(g|v)\,p(u,v),$ 6

is recast as an $p(u,v|f,g)\propto p(f|u)\,p(g|v)\,p(u,v),$ 7-minimization over the Jacobian, with Fourier-domain data fidelity derived explicitly for the edge variables. The resulting optimization admits FISTA with $p(u,v|f,g)\propto p(f|u)\,p(g|v)\,p(u,v),$ 8 convergence and closed-form matrix-valued shrinkage, while the per-iteration cost is dominated by FFTs (Chen et al., 2017).

In joint PET–MRI reconstruction via coupled Bregman iterations, coupling is expressed through generalized Bregman distances and their infimal convolution with respect to TV. The method is designed to handle different intensity scales by relying on TV subgradients rather than raw gradient magnitudes, and includes channel-weighting parameters that modulate the amount of interaction between PET and MRI (Rasch et al., 2017). A different PET–MRI line uses joint sparsity of tight-frame coefficients. Choi, Bao, and Zhang formulate a nonconvex balanced model with $p(u,v|f,g)\propto p(f|u)\,p(g|v)\,p(u,v),$ 9-type joint sparsity, solve it by proximal alternating minimization, and prove global convergence of the iterates to a critical point via the Kurdyka–Łojasiewicz property (Choi et al., 2017).

For multi-energy CT, the joint variable is the full set of energy-channel images $p(u,v)$ 0, and the coupling regularizer $p(u,v)$ 1 can be chosen as joint total variation, linear parallel level sets, spectral smoothness, or an SSIM-based structure function. The acquisition protocol itself is coupled across channels by using non-overlapping, interleaved projection angles, so the inverse problem and the data-collection design are co-ordinated at the model level (Toivanen et al., 2019). In multi-view compressed imaging, depth-induced warping matrices $p(u,v)$ 2 and occlusion masks $p(u,v)$ 3 impose geometric consistency constraints between views, and the overall convex program is solved by the Parallel Proximal Algorithm (Thirumalai et al., 2012).

Joint reconstruction can also absorb motion, registration, or segmentation. The Potts-model approach of Storath et al. introduces auxiliary variables so that reconstruction and segmentation are solved together; its ADMM splitting reduces the nonsmooth nonconvex problem to 1D Potts subproblems and a Tikhonov step, and the method recovers all segments of the Shepp–Logan phantom from $p(u,v)$ 4 angular views only (Storath et al., 2014). In variational multi-task MRI, the unknown high-resolution image $p(u,v)$ 5, deformation maps $p(u,v)$ 6, and super-resolution operator $p(u,v)$ 7 appear in one functional with an $p(u,v)$ 8 fidelity term, weighted TV, and an Ogden-type hyperelastic penalty, yielding a single model for reconstruction, registration, and super-resolution (Corona et al., 2019). A related spatiotemporal formulation places the motion variable in an LDDMM framework and optimizes a template $p(u,v)$ 9 jointly with a time-dependent velocity field $v(x)=Du(x)$ 0, with explicit continuous and time-discrete Euler–Lagrange conditions (Chen et al., 2018).

4. Deep, diffusion, and latent-space formulations

Deep JRM formulations retain the coupling idea but move the regularizer or latent prior into learned modules. J-MoDL is a model-based deep learning architecture for accelerated MRI in which the multichannel forward operator is $v(x)=Du(x)$ 1, the reconstruction solves

$v(x)=Du(x)$ 2

and unrolling yields a differentiable network $v(x)=Du(x)$ 3 that is trained jointly over $v(x)=Du(x)$ 4 and $v(x)=Du(x)$ 5. The data-consistency block is implemented by $v(x)=Du(x)$ 6 conjugate-gradient steps, gradients propagate through FFT/NUFFT and CG unrolling, and a parametric reduction $v(x)=Du(x)$ 7 is used to keep the search space small (Aggarwal et al., 2019).

In PET–MRI, diffusion-based JRMs replace explicit regularizers by score-based generative priors. One model learns the full joint distribution of paired PET–MRI images through a variance-exploding SDE, uses a 2D U-Net score network with sinusoidal time embeddings, FiLM, and cross-attention, and applies PET and MRI data-consistency gradient steps during reverse diffusion (Xie et al., 2024). MC-Diffusion uses a mutual consistency-driven diffusion prior $v(x)=Du(x)$ 8, a U-Net–style backbone based on the multi-path RefineNet U-Net, and predictor–corrector sampling with fidelity gradients for Poisson PET and undersampled Fourier MRI (Xie et al., 2023). Both formulations treat joint reconstruction as posterior sampling under a learned multimodal prior.

A related learnable variational model for multimodal MRI reconstructs multiple source contrasts and synthesizes a target contrast simultaneously. The objective contains source data fidelity, $v(x)=Du(x)$ 9-type feature sparsity terms induced by modality-specific feature extractors $\min_v \alpha\|v\|_{L^1_*}+H(v),$ 0, and an overview penalty linking the target image to a multimodal synthesis network $\min_v \alpha\|v\|_{L^1_*}+H(v),$ 1. The resulting optimization is unrolled into an $\min_v \alpha\|v\|_{L^1_*}+H(v),$ 2-phase network and trained with a bilevel-optimization framework (Bian et al., 2022). In diffusion MRI, joint $\min_v \alpha\|v\|_{L^1_*}+H(v),$ 3– $\min_v \alpha\|v\|_{L^1_*}+H(v),$ 4 reconstruction uses a pre-trained denoising autoencoder as a plug-and-play regularizer inside a model-based reconstruction, so that all diffusion-weighted images are recovered together from joint under-sampling across spatial frequency and diffusion encoding (Mani et al., 2020).

The most recent extension moves beyond medical imaging. The 2026 JRM for multiple objects without alignment is a 3D flow-matching generative model built on a VecSets-based VAE and ShapeR’s DiT. Its Coupled Fusion Block concatenates latent tokens from $\min_v \alpha\|v\|_{L^1_*}+H(v),$ 5 object instances, applies single-stream attention and MLP processing across instances, and splits the tokens back, thereby aggregating repeated but unaligned observations without explicit registration (Wu et al., 27 Mar 2026). In motion-corrected head CBCT, JRM-ADM inserts a wavelet-domain diffusion prior into a blind joint reconstruction and motion estimation loop; each diffusion step alternates denoising, a proximal volume update, and a rigid-motion update parameterized by $\min_v \alpha\|v\|_{L^1_*}+H(v),$ 6 B-spline coefficients (Paepe et al., 18 Apr 2025).

5. Reported applications and empirical behavior

The empirical literature reports gains across very different domains, but the magnitude and form of the gain depend strongly on how informative the coupling is.

Domain	Representative reported result	Source
Single-coil knee MRI, $\min_v \alpha\\|v\\|_{L^1_*}+H(v),$ 7 acceleration	Joint-MoDL: $\min_v \alpha\\|v\\|_{L^1_}+H(v),$ 8 $\min_v \alpha\\|v\\|_{L^1_}+H(v),$ 9 vs MoDL $x_i=m+\delta x_i$ 0-alone $x_i=m+\delta x_i$ 1 $x_i=m+\delta x_i$ 2 and Joint-UNet $x_i=m+\delta x_i$ 3 $x_i=m+\delta x_i$ 4	(Aggarwal et al., 2019)
PET–MRI, $x_i=m+\delta x_i$ 5 undersampling	JRM: PET PSNR $x_i=m+\delta x_i$ 6, MRI PSNR $x_i=m+\delta x_i$ 7, surpassing LPLS and Joint ISTA-Net on 167 subjects $x_i=m+\delta x_i$ 8 5 slices from ADNI	(Xie et al., 2024)
PET–MRI, $x_i=m+\delta x_i$ 9 undersampling	MC-Diffusion: PET PSNR $m$ 0, MRI PSNR $m$ 1	(Xie et al., 2023)
Head CBCT, 20-view sparse-view regime	JRM-ADM: PSNR $m$ 2, SSIM $m$ 3 vs JRM-TV $m$ 4, $m$ 5, and FDK $m$ 6, $m$ 7	(Paepe et al., 18 Apr 2025)
Real-world 3D scenes, Replica / ScanNet++	JRM: CD $m$ 8, NC $m$ 9, F1 $\delta x_i$ 0, outperforming DPRecon and FM	(Wu et al., 27 Mar 2026)
Multi-view compressed images	about $\delta x_i$ 1– $\delta x_i$ 2 over independent H.264 Intra at low rates; $\delta x_i$ 3– $\delta x_i$ 4 better than DISCOVER on average	(Thirumalai et al., 2012)

Older variational JRMs report qualitatively similar patterns. The edge-reconstruction model for multi-contrast MRI reduces error $\delta x_i$ 5– $\delta x_i$ 6 faster than BCS and FCSA-MT in convergence plots, and its ER-weighted variant typically achieves $\delta x_i$ 7– $\delta x_i$ 8 lower relative error at the same time budget (Chen et al., 2017). In PET–MRI with coupled Bregman iterations, the ICB-JRM configuration yields the highest PET SSIM under all tested MRI sampling patterns and matches or exceeds MRI SSIM while suppressing cross-channel artifacts (Rasch et al., 2017). In multi-energy CT, joint regularization outperforms channel-by-channel reconstruction, and S+TV with $\delta x_i$ 9 interleaved angles achieves nearly identical RMSE and MSSIM to single-energy TV with $m$ 0 angles at each energy, using only one-third the dose (Toivanen et al., 2019). In multimodal MRI reconstruction and synthesis, adding the synthesis penalty improves T1/T2 reconstruction by about $m$ 1 and yields a $m$ 2 PSNR gain with $m$ 3 SSIM over the best baseline under $m$ 4 undersampling (Bian et al., 2022).

This body of evidence suggests that JRM is most effective when the coupling variable encodes a real shared signal: common anatomy, common edges, repeated object identity, or physically coherent motion.

6. Limitations, misconceptions, and open directions

A frequent misconception is that “joint” invariably means “better.” The literature is more conditional. In object-centric 3D reconstruction, negative-pair ratio is critical: training with $m$ 5 negatives gives the best trade-off, whereas $m$ 6 negatives causes over-aggregation and $m$ 7 negatives makes the model ignore companions (Wu et al., 27 Mar 2026). In multi-energy CT, pure D1 or S penalties can produce spatially noisy images, and only the D1+TV or S+TV variants restore spatial smoothness (Toivanen et al., 2019). In multi-view compressed imaging, inaccurate depth estimation from highly quantized images degrades joint recovery, and TV can oversmooth texture at high rates (Thirumalai et al., 2012).

Another misconception is that JRM denotes a settled architecture. In fact, the term spans convex optimization, primal–dual and Bregman schemes, alternating minimization, multiplicative MM updates, unrolled networks, diffusion posterior sampling, bilevel training, and latent-space flow matching (Choi et al., 2017, Arridge et al., 2020, Aggarwal et al., 2019, Xie et al., 2023). The unifying element is coupled estimation, not a single solver.

Computational cost remains a persistent limitation. MC-Diffusion reports inference of about $m$ 8 min/slice on a V100 GPU with $m$ 9 steps (Xie et al., 2023). JRM-ADM requires about $X$ 0 min per $X$ 1-view reconstruction on a modern GPU and is described as too slow for emergent clinical use (Paepe et al., 18 Apr 2025). Several works therefore propose reduced parametrizations, warm-up schedules, stationary-operator speed-ups, or alternating subproblem solvers to keep joint estimation tractable (Aggarwal et al., 2019, Arridge et al., 2020).

Uncertainty quantification is also unevenly handled. Deterministic JRMs generally return point estimates, which motivated the probabilistic extension pJRM for CO $X$ 2-plume monitoring; pJRM uses a shared generative network and per-survey latent codes to obtain posterior distributions rather than only $X$ 3 (Deng et al., 30 Jan 2025). This suggests a broader frontier in which joint reconstruction is paired with posterior calibration rather than only improved mean performance.

The open directions stated across the literature are diverse but convergent in spirit: truly 2D non-parametric sampling and hardware-constrained optimization in MRI, 3D and non-Cartesian acquisition design, real-patient CBCT with non-rigid motion, 3D PET–MRI and tighter physics integration, unpaired multimodal training, and larger heterogeneous groups for repeated-object reconstruction (Aggarwal et al., 2019, Paepe et al., 18 Apr 2025, Xie et al., 2023, Bian et al., 2022, Wu et al., 27 Mar 2026). Taken together, these directions indicate that JRM has evolved from joint regularization of multiple images into a general framework for coupled inference under shared structure, with formulations ranging from convex edge models to score-based and flow-matching generative priors.