Cyclic Test-Time Adaptation

Updated 30 November 2025

Cyclic-TTA is a framework that simulates recurring domains to assess rapid adaptation and retention in test-time settings.
It leverages cyclical adaptation protocols with alternating updates between networks to improve metrics such as MPJPE and temporal smoothness in 3D reconstruction tasks.
By quantifying both adaptability and forgetting, cyclic-TTA sets a rigorous standard for continual learning in dynamic, non-stationary environments.

Cyclic Test-Time Adaptation (Cyclic-TTA) designates a paradigm and experimental protocol that evaluates and advances test-time adaptation (TTA) in settings where domains recur, demanding both rapid adaptation to newly observed domains and retention of learned knowledge for recurring domains. This recurrency exposes the limitations of classical TTA and continual TTA (CTTA) approaches, which typically do not address domain reappearance or the challenges it poses for stability, memory, and fast recovery of accuracy. Cyclic-TTA has been introduced both as a general benchmark for continual adaptation under domain recurrence (Iftee et al., 23 Nov 2025), and as a core methodological innovation in the context of 3D human mesh reconstruction via the cyclical co-adaptation of denoising and reconstruction networks (Nam et al., 2023).

1. Formalization of the Cyclic-TTA Benchmark

Formally, Cyclic-TTA is defined by a set of $K$ domain groups $G = \{G_1, ..., G_K\}$ . The test stream is constructed as an infinite sequence that repeatedly cycles through these groups, creating the sequence

$D = [G_1, G_2, ..., G_K, G_1, G_2, ..., G_K, ...]$

At time step $t$ , the active group index $c(t)$ is given by:

$c(t) = ((t - 1) \bmod (K \cdot r)) \bmod K + 1$

where $r \in \mathbb{N}$ is the cycle repetition rate, controlling how many times all groups are visited before the stream wraps. Each sample $x_t$ is drawn from distribution $p(x \mid G_{c(t)})$ .

Cyclic-TTA thus simulates a realistic non-stationary environment where the system is exposed to shifting and then recurring domains (e.g., weather patterns, sensor conditions, corruption types). For example, in the ImageNet-C or CIFAR-C setting, the 15 corruptions are arranged into five groups (Noise, Blur, Weather, Digital, Distortion), each constituting a domain group $G_j$ , and the evaluation cycles through these groups in sequence (Iftee et al., 23 Nov 2025).

2. Methodologies Leveraging Cyclic-TTA

Cyclic-TTA protocols are central in evaluating and shaping new TTA algorithms. In "Cyclic Test-Time Adaptation on Monocular Video for 3D Human Mesh Reconstruction" (Nam et al., 2023), the framework dubbed CycleAdapt demonstrates cyclic adaptation of two specialized networks:

HMRNet: A Human Mesh Reconstruction Network, which processes single RGB images and regresses SMPL pose ( $\hat\theta \in \mathbb{R}^{144}$ ), shape ( $\hat\beta \in \mathbb{R}^{10}$ ), and camera parameters ( $\hat k \in \mathbb{R}^3$ ) to predict 3D human meshes.
MDNet: A Motion Denoising Network, which consumes temporal sequences of HMRNet-predicted poses (length $T = 49$ ) and produces denoised pose vectors, serving as pseudo-3D ground truth targets.

In each adaptation cycle, the system alternates between updating HMRNet (using 3D supervision from MDNet combined with standard 2D reprojection losses) and updating MDNet (using masked denoising losses on sequences of pose vectors). This cyclical back-and-forth progressively improves both networks, enhancing the overall 3D reconstruction performance on non-stationary, real-world test videos.

Cyclic-TTA as a benchmark is further illustrated in (Iftee et al., 23 Nov 2025), with SloMo-Fast employing a slow-momentum "teacher" to retain long-term domain knowledge and a fast-adaptive "teacher" to quickly integrate new information, explicitly evaluated over the cyclically-repeated domain schedule.

3. Loss Functions, Update Rules, and Adaptation Pseudocode

The cyclic adaptation in CycleAdapt (Nam et al., 2023) is rigorously specified:

HMRNet adaptation loss:

$L_{\rm HMR} = L_{\rm SMPL} + L_{2D}$

Where

$L_{\rm SMPL} = \|\hat\theta - \theta'\|_1 + \gamma \|\hat\beta - \beta'\|_1, \quad \gamma = 0.001$

$L_{2D} = \bigl\| \Pi_{\hat k}(J\,\hat M) - J^{2D} \bigr\|_1$

with $J$ as the SMPL joint-regression matrix and $\Pi_{\hat k}$ the weak-perspective projection.

MDNet adaptation loss:

$L_{\rm MD} = \frac{1}{T}\sum_{t=0}^{T-1} m_t\, \|\hat\theta'_t - \hat\theta_t\|_1$

with $m_t$ being random binary masks for denoising.

Cyclic Adaptation Algorithm:

# Pseudocode for a single test video X = {x_i}
initialize D_i ← (0,0) for all i
load pretrained M_HMR, M_MD
for cycle c = 1...C:
    # HMRNet adaptation
    for each image x_i:
        θ'_i, β'_i = D_i
        θ̂_i, β̂_i, k̂_i = M_HMR(x_i)
        compute L_HMR = L_SMPL + L_2D
        update M_HMR by ∇L_HMR
        D_i ← (θ̂_i, β̂_i)

    # MDNet adaptation
    for each temporal window {θ̂_j, ..., θ̂_{j+T-1}} from D:
        mask half the frames (random m_t)
        θ̂'_j, ... = M_MD(masked sequence)
        compute L_MD, update M_MD by ∇L_MD
        D_j:t ← θ̂'_j:t

This cyclic process continues for $C$ cycles, finally outputting refined human mesh predictions.

4. Distinctions Between Cyclic-TTA and Standard Continual TTA

Standard CTTA (Continual Test-Time Adaptation) assumes a single, non-repeating pass through $K$ domains: $D = [D_1, D_2, ..., D_K]$ . Domains are encountered once, with adaptation focused on minimizing catastrophic forgetting.

Cyclic-TTA, by contrast, explicitly introduces domain recurrence, challenging methods to not merely adapt, but to retain knowledge so that when a domain returns, the model quickly recovers its previous accuracy without catastrophic forgetting. This penalizes approaches that forget prior domains and must re-learn from scratch upon recurrence. Cyclic-TTA therefore quantifies both adaptivity and retention by measuring recovery speed (time-to-plateau), average error across cycles, and performance drop (forgetting) when domains recur (Iftee et al., 23 Nov 2025).

5. Evaluation Protocols and Metrics in Cyclic-TTA

Cyclic-TTA requires rigorous, streaming evaluation across multiple cycles. The main reported metrics (Iftee et al., 23 Nov 2025) are:

Average error over $C$ cycles:

$\mathrm{Err}_{\mathrm{Cyclic}} = \frac{1}{C N} \sum_{i=1}^{C} \sum_{t=(i-1)N+1}^{iN} \ell_t$

where $N$ is the number of domain instances per cycle, $\ell_t$ is per-sample classification loss.

Forgetting per domain $d$ :

$\mathrm{Forget}_d = a_d^{(\mathrm{first})} - a_d^{(\mathrm{last})}$

comparing accuracy at first and last visit.

Stability per group $j$ :

$\mathrm{STD}_j = \mathrm{std}\left( \{ a_t \mid c(t) = j, t = 1,\dots, CN \} \right)$

quantifying performance fluctuations.

Adaptation Rate (optional):

$\mathrm{AdaptRate} = \frac{\mathrm{APS}}{\mathrm{TTP} - \lambda\,\mathrm{STD}}$

(where APS is average positive slope and TTP is time-to-plateau).

These metrics expose both the speed and robustness of adaptation and knowledge retention across cycles.

6. Experimental Findings and Comparative Performance

Empirical studies using CycleAdapt for 3D human mesh reconstruction reveal that prior TTA methods relying solely on 2D evidence (e.g., BOA [guan2021bilevel], DynaBOA [guan2022out], DAPA [weng2022domain]) exhibit greater depth ambiguity and are less robust to noise or missing cues. By introducing cyclic adaptation and leveraging MDNet-denoised 3D supervision, CycleAdapt achieves lower mean per-joint position error (MPJPE) than these baselines (CycleAdapt MPJPE: 87.7 mm; prior best: 108.0 mm) (Nam et al., 2023).

Key ablations demonstrate:

MDNet adaptation alone (with HMRNet frozen) improves MPJPE from 114.2 to 96.2 mm.
Adding cyclic MDNet-HMRNet retraining further reduces MPJPE to 87.7 mm.
Co-adaptation is necessary; freezing MDNet stagnates HMRNet at higher error.
CycleAdapt maintains bone length consistency and plausible depth order, and achieves ∼40% improvement in temporal smoothness compared to video-only baselines.

A plausible implication is that cyclic adaptation strategies, when evaluated under domain recurrence protocols, offer concrete advantages for both rapid adaptation and long-term retention, directly addressing dual challenges introduced by cyclic domain recurrence.

7. Context, Applications, and Implications

Cyclic-TTA benchmarks and methodologies provide a stringent testbed and design principle for robust, generalizable adaptation algorithms. Practical applications span computer vision (3D mesh reconstruction, domain-robust classification) and any scenario where distribution shift and domain recurrence are anticipated (e.g., autonomous driving, surveillance, prosthetic control). Ongoing research leverages this protocol to develop models capable of both fast reacquisition of skills and strong resistance to catastrophic forgetting, as exemplified by SloMo-Fast’s dual-teacher strategy (Iftee et al., 23 Nov 2025).

By quantifying both transient and persistent adaptation skills, Cyclic-TTA establishes a demanding and realistic standard for continual learning and test-time adaptation, incentivizing architectures and learning strategies optimized for non-stationary, cyclic environments.