Training encoder–decoders for long, high‑resolution CT reconstruction

Develop a three-dimensional encoder–decoder architecture and training methodology that can reconstruct long, high-resolution computed tomography (CT) sequences with high fidelity and spatial coherence, overcoming current limitations that prevent robust end-to-end reconstruction at full resolution and length.

Background

The paper reviews prior text-conditional 3D medical image generation methods that either use cascaded low-resolution stages or lightweight architectures, both of which lead to spatial artifacts or reduced fidelity when handling long, high-resolution CT sequences.

Within this context, the authors explicitly note that achieving robust training of a high-capacity 3D encoder–decoder capable of reconstructing long, high-resolution CT sequences is still unresolved, motivating the BTB3D framework and its three-stage training strategy.

References

However, training an encoder-decoder capable of reconstructing long, high-resolution CT sequences remains an open challenge.

— Better Tokens for Better 3D: Advancing Vision-Language Modeling in 3D Medical Imaging (2510.20639 - Hamamci et al., 23 Oct 2025) in Introduction (Section 1)

Training encoder–decoders for long, high‑resolution CT reconstruction

Background

References

Related Problems