Papers
Topics
Authors
Recent
2000 character limit reached

MFI-ResNet: MeanFlow-Incubated ResNet

Updated 23 November 2025
  • The paper introduces MFI-ResNet, which replaces stacked residual blocks with one- or two-step MeanFlow mappings to significantly reduce parameters while preserving accuracy.
  • It leverages the formal equivalence between residual blocks and ODE discretizations, using flow-field modeling to efficiently align feature states.
  • Selective incubation restores shallow ResNet layers, balancing the flow-based compression with discriminative performance, as evidenced by slight accuracy gains on CIFAR benchmarks.

MeanFlow-Incubated ResNet (MFI-ResNet) denotes a neural architecture optimization methodology that replaces multi-step residual processing in standard ResNet with one- or two-step generative mappings via a mean field “MeanFlow” module, and then selectively restores layers (“incubation”) to balance parameter efficiency with discriminative performance. This technique leverages the formal equivalence between residual blocks and ordinary differential equation (ODE) discretizations, introducing flow-based compression and expansion phases for high-accuracy, parameter-light models (Sun et al., 16 Nov 2025).

1. Theoretical Underpinnings

ResNet’s architecture can be interpreted as a discretized ODE in feature space, where each residual block approximates an instantaneous velocity increment: zt+1=zt+f(zt,t)Δtz_{t+1} = z_t + f(z_t, t)\Delta t This Euler discretization accumulates incremental changes via multiple residual blocks per stage, aligning with the ODE framework described in the literature (Eq. 1 in (Sun et al., 16 Nov 2025)).

MeanFlow, introduced by Geng et al., generalizes this to a single-step flow matching scheme that learns the average velocity field uθ(z,t)u_\theta(z, t). The mapping between two feature states zalignz_{\text{align}} and ztargetz_{\text{target}} over a temporal interval is governed by: dz(t)dt=uθ(z(t),t),z(0)=zalign,z(1)=ztarget\frac{dz(t)}{dt} = u_\theta(z(t), t), \quad z(0) = z_{\text{align}},\quad z(1) = z_{\text{target}} The learning objective is a flow-matching loss: LMF=1Ni=1Nuθ(zi,ti)utarget,i22L_{MF} = \frac{1}{N} \sum_{i=1}^N\| u_\theta(z_i, t_i) - u_{target,i} \|_2^2 where utarget=v(tr)uθtu_{target} = v - (t-r)\frac{\partial u_\theta}{\partial t} and v=zalignztargetv = z_{\text{align}} - z_{\text{target}} (Sec. 3.1, (Sun et al., 16 Nov 2025)).

2. Compression Phase: MeanFlow Mapping Modules

Each of the four ResNet stages is replaced with a “MeanFlow mapping module” MlM_l. The module structure is:

  • A 1×1 convolution + BatchNorm + ReLU for dimensional alignment, outputting zalignz_{\text{align}}.
  • An ODE-driven flow network uθu_\theta that learns the mean velocity.

For stages 1–3, a single MeanFlow step is used. For stage 4, two sequential MeanFlow sub-steps are performed: z0.5(4)=z0(4)+0.5u(4,1)(z0(4),0,0.5) z1(4)=z0.5(4)+0.5u(4,2)(z0.5(4),0.5,1)z^{(4)}_{0.5} = z^{(4)}_0 + 0.5 \cdot u^{(4,1)}(z^{(4)}_0, 0, 0.5) \ z^{(4)}_1 = z^{(4)}_{0.5} + 0.5 \cdot u^{(4,2)}(z^{(4)}_{0.5}, 0.5, 1) Each module is independently trained for 300 epochs (AdamW optimizer, learning rate 2×1042\times 10^{-4}, batch size 128/GPU, 9×RTX3090), using fixed features from a pretrained ResNet. The resulting modules are cascaded and lightly fine-tuned with cross-entropy loss (stem frozen).

Model Parameters (M) Allocation in Last Stage (%)
ResNet-50 23.51 ~60
Full MeanFlow 5.11 -

Stage 4, containing ~14.96 M parameters in ResNet-50, is effectively replaced with a two-step MeanFlow system, reducing parameter count ~78% versus a standard model (Table 1, (Sun et al., 16 Nov 2025)).

3. Expansion Phase: Selective Incubation

ResNet’s parameter allocation is heavily imbalanced, with stages 1–3 containing ~38–40% of parameters and stage 4 ~60%. In the expansion/incubation phase, MFI-ResNet incrementally restores the original ResNet layers in the shallow stages (1, 2, 3).

The process involves:

  1. Sequentially replacing M1M_1, M2M_2, M3M_3 with their pre-trained ResNet stage counterparts.
  2. Initializing with corresponding ResNet weights and freezing the non-incubated modules.
  3. Training each newly-incubated stage for 200 epochs (learning rate 10310^{-3}).
  4. After all three shallow stages are restored, the resulting model comprises ResNet stages 1–3 followed by a two-step MeanFlow stage 4.
  5. All parameters are unfrozen and globally fine-tuned for 100 epochs (learning rate 10310^{-3}).

Pseudocode provided in section 3.3 of (Sun et al., 16 Nov 2025) illustrates this pipeline.

4. Training Regimes and Optimization

Training utilizes standard vision benchmarks (CIFAR-10, CIFAR-100; 50K/10K train/test, with random crop, flip, mean-std normalization). Key hyperparameters:

  • Optimizer: AdamW, weight decay 0.01, cosine annealing schedule.
  • MeanFlow mapping: 300 epochs, learning rate 2×1042\times 10^{-4}, batch size 128/GPU.
  • Incubation (per stage): 200 epochs, learning rate 1×1031\times 10^{-3}.
  • Global fine-tuning: 100 epochs, learning rate 1×1031\times 10^{-3}.
  • Label smoothing ϵ=0.1\epsilon = 0.1.

This protocol ensures both efficient feature transfer via MeanFlow and maintenance of discriminative power through ResNet block restoration (Sun et al., 16 Nov 2025).

5. Experimental Evaluation

Empirical assessment on CIFAR-10 and CIFAR-100 demonstrates that MFI-ResNet achieves substantial efficiency gains without accuracy loss.

Model Params (M) CIFAR-10 Acc. (%) CIFAR-100 Acc. (%)
ResNet-50 23.51 95.34 75.80
MFI-ResNet-50 12.62 95.56 (+0.22) 75.93 (+0.13)

Parameter reductions reach 46.28% (CIFAR-10) and 45.59% (CIFAR-100), while test accuracy improves slightly on both tasks (Table A, (Sun et al., 16 Nov 2025)).

6. Analysis and Interpretative Insights

The success of MFI-ResNet hinges on the capacity of generative “flow-fields” (MeanFlow modules) to encapsulate multi-block residual transformations as a single, explicit mapping over feature space. This stands in contrast to the traditional approach, where a sequence of shallow increments (instantaneous velocities) is accumulated per stage.

Empirical results suggest that shallow network stages crucially benefit from full residual hierarchies to capture local discriminative features, whereas deep, high-dimensional stages can be summarized by two MeanFlow steps with negligible accuracy loss but substantial parameter savings. This provides evidence for a connection between generative ODE-style modeling (as in flow matching frameworks) and discriminative residual design, indicating potential for further architectural synergies (Sun et al., 16 Nov 2025).

7. Broader Implications and Directions

MFI-ResNet demonstrates that substituting stacked residual blocks with a parameter-efficient flow-field mapping is viable for deep discriminative networks. A plausible implication is that future architectures may further bridge generative and discriminative paradigms, exploiting flow-based representations for both feature efficiency and learning dynamics. The explicit linkage between ODE-based flow fields and discriminative layer composition constitutes a new perspective for neural architecture design, meriting further paper of flow-matching principles and their integration with established deep learning models (Sun et al., 16 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to MeanFlow-Incubated ResNet (MFI-ResNet).