Papers
Topics
Authors
Recent
2000 character limit reached

VPT-Shallow in ViT and Shallow-Water Modeling

Updated 6 January 2026
  • VPT-Shallow is a dual-purpose framework featuring shallow prompt injection for Vision Transformers and a quadratic pressure profile for efficient shallow-water modeling.
  • In Vision Transformers, it updates only 0.16% of parameters at the first layer, achieving competitive accuracy while minimizing adaptation complexity.
  • In fluid dynamics, its quadratic non-hydrostatic pressure formulation enables accurate wave simulations with lower computational overhead.

VPT-Shallow denotes two distinct but technically precise concepts in modern research: (1) a parameter-efficient fine-tuning protocol for Vision Transformers that leverages shallow prompt token injection ("Visual Prompt Tuning—Shallow"), and (2) a quadratic vertical pressure formulation within non-hydrostatic shallow-water wave modeling ("Vertical Pressure with Quadratic profile—Shallow-Water"). Both instantiations share the theme of maximizing effect while minimizing intervention—either in model adaptation depth (ViT) or pressure profile complexity (fluid dynamics).

1. Definition and Key Frameworks

Visual Prompt Tuning—Shallow (ViT context):

VPT-Shallow (VPT-S) is a variant of Visual Prompt Tuning (VPT) for frozen Vision Transformer backbones, introducing a learnable prompt matrix P0Rp×d\mathbf{P}_0\in\mathbb{R}^{p\times d} solely at the input to the first Transformer layer. Only this prompt is updated; all other model parameters remain static. Prompt tokens are prepended to the sequence of embeddings (class token, patches) at L1L_1. No further prompt tokens are introduced at deeper layers (Xiao et al., 10 Jul 2025, Jia et al., 2022).

Vertical Pressure with Quadratic Profile—Shallow (Fluid context):

VPT-Shallow in fluid modeling refers to a quadratic non-hydrostatic pressure profile within a shallow-water Eulerian framework, arising from assuming linear vertical velocity variation with depth. This quadratic profile is necessary for consistency with Boussinesq-type wave equations and is crucial for simulating events such as moving-bottom-generated tsunamis and landslides (Firdaus et al., 2024).

2. Mathematical Formulation and Mechanism

Vision Transformers:

Let NN be transformer depth, dd the embedding size, pp the prompt token count, and kk the number of patch tokens.

  • Layer 1 input: [x0;P0;E0]R(1+p+k)×d[\mathbf{x}_0;\,\mathbf{P}_0;\,\mathbf{E}_0] \in \mathbb{R}^{(1+p+k)\times d}
  • Layer 1 output: [x1,Z1,E1]=L1([x0,P0,E0])[\mathbf{x}_1,\,\mathbf{Z}_1,\,\mathbf{E}_1] = L_1([\mathbf{x}_0,\,\mathbf{P}_0,\,\mathbf{E}_0])
  • Layers 2N2\ldots N: Standard forward pass using outputs from Li1L_{i-1}; no new prompts injected.

Fluid Modeling:

Starting from the incompressible Euler equations with moving bottom, the depth-averaged system is:

  • Continuity: ht+(hu)x=0h_t + (hu)_x = 0
  • Horizontal Momentum: (hu)t+(hu2+12gh2)x=ghdx(hpnh)x/ρ+Pnhz=ddx/ρ(hu)_t + (hu^2 + \frac{1}{2}gh^2)_x = gh\,d_x - (h p^{nh})_x/\rho + P^{nh}|_{z=-d} d_x/\rho
  • Vertical Momentum: (hw)t+(huw)x=Pnhz=d/ρ(hw)_t + (huw)_x = P^{nh}|_{z=-d}/\rho
  • Kinematic constraint: 2hw+hu(2dh)x+2hdt=h(hu)x2hw + hu(2d-h)_x + 2h d_t = -h(hu)_x

Non-hydrostatic bottom pressure for quadratic profile:

Pnhz=d=64+dx2  pnh+dx4+dx2(hpnh)x+ϕP^{nh}|_{z=-d} = \frac{6}{4 + d_x^2}\;p^{nh} + \frac{d_x}{4 + d_x^2}\,(h\,p^{nh})_x + \phi

where ϕ\phi collects known quantities (Firdaus et al., 2024).

3. Parameter Efficiency and Empirical Performance

Vision Transformers:

VPT-Shallow updates only p×dp\times d parameters (e.g., for ViT-Base/16: 50768=38,40050 \cdot 768 = 38{,}400; 0.16% of the full model), compared to VPT-Deep’s per-layer prompts (N50768460,800N \cdot 50 \cdot 768 \simeq 460{,}800; 0.73%) or full fine-tuning (all 85.8M parameters). Empirical results (mean accuracies over benchmarks):

  • FGVC: 84.62%
  • HTA: 85.5%
  • VTAB-1k: 64.85% VPT-Shallow is particularly competitive for simple domains, but shows reduced adaptation capacity in fine-grained/distribution-shift settings compared to VPT-Deep and ViaPT (Xiao et al., 10 Jul 2025, Jia et al., 2022).

Fluid Modeling:

VPT-Shallow’s elliptic system for dispersion and moving-bottom forcing adds only two prognostic variables (hwh w, pnhp^{nh}) per time step and keeps computational overhead low compared to full Boussinesq-type models. Validation against laboratory and benchmark test cases demonstrates faithful reproduction of waveform amplitude, phase, and dispersive effects, with lower cost per time step than higher-order models (Firdaus et al., 2024).

4. Conceptual Motivation and Regime Interpretation

Vision Transformers:

VPT-Shallow represents an extreme on the adaptation spectrum—maximal information flow with zero layer-specific prompt adaptation after L1L_1. In ViaPT’s unified framework, it corresponds to retaining all PCA dimensions (m=dm=d) across all layers post-prompt fusion, limiting instance- or layer-specific flexibility. Conversely, VPT-Deep discards previous prompts entirely (m=0m=0), maximizing deep adaptation. Random reduction yields intermediate performance, lacking principled retention of variance (Xiao et al., 10 Jul 2025).

Fluid Modeling:

Quadratic vertical pressure profiles are physically motivated by shallow-layer assumptions and the requirement to capture essential non-hydrostatic effects for realistic wave dynamics. The solution avoids ambiguously defined time derivatives that preclude projection methods, yielding stable, local DG-solvable systems suitable for operational tsunami and landslide forecasting (Firdaus et al., 2024).

5. Practical Implementation and Application Contexts

Vision Transformers:

  • Implementation requires only the addition of the p×dp\times d prompt matrix at L1L_1, with backbone frozen.
  • Training uses standard optimizers (SGD/AdamW), cosine decay schedules, and modest batch sizes.
  • Storage and inference overheads are minimal (0.16% for p=50p=50), enabling multi-task or resource-constrained deployment (Jia et al., 2022).

Fluid Modeling:

  • VPT-Shallow can be incorporated into existing shallow-water codes by adding two additional unknowns and solving a 2×22\times2 elliptic system per time step via LDG methods.
  • Suitable for tsunami, landslide, and earthquake simulations where dispersive effects and moving-bottom generation are critical.
  • Limitations include neglect of viscous/frictional terms; extension to 2D/3D bathymetries necessitates advanced solvers (Firdaus et al., 2024).

6. Comparative Analysis and Future Directions

Vision Transformers:

VPT-Shallow excels in efficiency and simplicity, achieving competitive results on diverse visual tasks with minimal adaptation, but underperforms in settings requiring complex, layer-dependent modulation. ViaPT offers a more principled mechanism for dataset- and instance-aware prompt balance, outperforming both shallow and deep schemes on distributionally complex domains.

Fluid Modeling:

Quadratic VPT-Shallow formulation achieves dispersive accuracy comparable to Boussinesq theory, without burdensome nonlocal derivatives. Further research aims at integrating friction, extending to higher dimensions, and real-time source inversion (Firdaus et al., 2024).

7. Summary Table: VPT-Shallow Properties in ViT Context

Variant Prompt Injection % Params Tuned (ViT-B/16) FGVC Accuracy (%)
VPT-Shallow Only at layer 1 0.16 84.62
VPT-Deep At every layer 0.73 89.11
Full Tune All layers 100 88.54

VPT-Shallow thus occupies a unique niche in both machine learning and fluid dynamics: as the minimal, efficiently-implemented prompt adaptation scheme for frozen ViTs, and as the quadratic-profile non-hydrostatic extension to shallow-water modeling. Both usages leverage shallow modifications for substantial operational gains, but are fundamentally constrained in terms of expressivity—when compared to deeper prompt adaptations or higher-complexity non-hydrostatic fluid models (Xiao et al., 10 Jul 2025, Jia et al., 2022, Firdaus et al., 2024).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to VPT-Shallow.