OmniArch: Unified Foundation Model
- OmniArch is a unified deep foundation model that integrates scientific computing for PDEs with dense multimodal representation learning.
- It employs an axis-based encoder and a decoder-only Transformer to achieve high parameter efficiency and robust zero-shot generalization across 1D–3D domains.
- Physics-Informed Reinforcement Learning fine-tunes outputs via natural language PDE captions, enabling interactive solver steering and scalable deployment.
OmniArch refers to a distinct line of research and development in deep foundation model architectures, with principal deployments in two technical domains: (1) scientific machine learning for partial differential equations (PDEs) and (2) dense multimodal representation learning. In both cases, the defining feature is the pursuit of unified architectures capable of simultaneously modeling “omni-domain” inputs—whether spatial/temporal/physical (for scientific computing) or image/audio/text (for modality fusion)—with high parameter efficiency and robust out-of-distribution generalization.
1. Core Architecture for Scientific Computing
OmniArch, as introduced in "OmniArch: Building Foundation Model For Scientific Computing," presents a unified framework to address multi-scale and multi-physics PDE problems, focusing on a physically-aligned, scalable neural operator that consolidates training and inference across 1D, 2D, and 3D spatial domains (Chen et al., 2024).
Encoder–Decoder and Tokenization
The principal innovation is an axis-based encoder mapping an -dimensional coordinate (space–time) to an -dimensional hidden state: where is the multi-channel field value. These hidden vectors are flattened (following xVal), producing a token sequence for a standard decoder-only Transformer. Channel-wise grouping is performed at each timestep; tokenization proceeds by concatenating channel tokens , which are then consumed by the backbone.
A learnable inverse reconstructs fields at decode time:
The architecture does not incorporate classic Fourier neural operators or explicit mechanisms for “fading out disharmony across separated dimensions”; such mechanisms are not described in the source.
Transformer Backbone
OmniArch employs a decoder-only Transformer characterized by:
- Hidden size:
- Intermediate size: $4096$
- Attention heads: 16
- Normalization: Pre-norm with RMSNorm
- Position encoding: Rotary position embeddings (RoPE)
- Masking: Causal (shift-right)
Self-attention is implemented per standard: No additional inductive biases for “physical alignment” are introduced in the attention layers.
Physics-Informed Fine-Tuning
The primary fine-tuning strategy is Physics-Informed Reinforcement Learning (PIRL), which diverges from classic PINN/PDE residual loss formulations. A CLIP-style scorer produces a scalar reward to align the model output with a natural-language PDE caption : Fine-tuning maximizes expected reward: There are no explicit physics-constraint or residual terms; all alignment is learned via the reward function.
2. Unified Training on PDEBench
The pre-training scheme pools a comprehensive set of PDE types into a single learning task—solving 1D, 2D, and 3D problems concurrently, as detailed below:
| Dimension | PDE Classes | Train Traj. | Valid. | Test | ||
|---|---|---|---|---|---|---|
| 1D | CFD, Reaction–Diffusion, Advection, Burgers, Diff.-Sorption | 378k | 42k | 28 | 81 | 1024 |
| 2D | CFD, Diff–React, Navier–Stokes, Shallow Water | 42.3k | 300 | 8 | 21 | 128 |
| 3D | 3D CFD | 630 | 70 | 6 | 21 | 64 |
Out-of-domain evaluation includes shock tubes, Kelvin–Helmholtz, TOV, blastwave, and turbulence, establishing the model’s capacity for in-context and zero-shot generalization.
Training utilizes batch-wise normalized RMSE (nRMSE): where is the batch-wise field standard deviation.
3. Performance Characteristics and Adaptability
Quantitative Results
Comparison against baselines demonstrates substantial improvement on 1D and 2D tasks:
- 1D_CFD: nRMSE reduced from 0.0981 (best baseline) to 0.0392 (OmniArch+PIRL, )
- Burgers: 0.0067 0.0032 ()
- 2D_CFD: 0.0994 0.0153 ()
- 3D_CFD: 0.6600 (OmniArch) vs. 0.3050 (FNO); current 3D performance lags the state-of-the-art.
Generalization
Zero-shot out-of-domain generalization on 2D flows is significant (nRMSE: 0.2126, 0.5432, 0.1718 on Shock, KH, OTVortex), surpassing FNO, UNet, and MPP baselines. Increasing prompt length improves performance without explicit time-feature input, signifying implicit temporal derivative modeling.
Dynamic prompt sizing allows a inference speed-up at moderate accuracy loss, supporting high-throughput applications.
4. Engineering and Scientific Impact
OmniArch’s fusion of 1D–3D PDE surrogate modeling facilitates rapid deployment in diverse settings including computational fluid dynamics (CFD) for aircraft, weather prediction, and semiconductor process simulation. PIRL-based alignment with textual descriptions creates opportunities for interactive solver steering, automated tuning, and scientific hypothesis generation via “learned heuristics.” The architecture’s zero-shot generalization capacity suggests emergent operator representations, laying groundwork for inverse design and automated PDE discovery through self-supervised exploration (Chen et al., 2024).
5. Related Paradigms: Multimodal Omni-Architectures
“OmniArch” (as a term) is sometimes conflated with other “omni” architectures in adjacent fields, particularly in modality fusion.
- Omni-C (Lau et al., 27 Feb 2026) employs a single dense ViT backbone with small modality-specific MLP heads and single-modality contrastive losses. It achieves competitive unimodal and cross-modal retrieval/recognition with ≈45% reduction in parameter/memory footprint relative to separate expert encoders, and supports sequentially processed inference for edge devices.
- HyperCLOVA X 8B Omni (Team, 5 Jan 2026) implements a 36-layer Transformer generating any-to-any text, audio, and vision outputs from a unified interleaved sequence—employing both discrete and continuous embeddings in a shared autoregressive backbone, and delivering leading performance on language, vision-text, and audio tasks.
OmniArch in scientific computing is distinguished by its physical modeling objectives and reinforcement-style physics alignment, whereas omnimodal models in representation learning target cross-modal data fusion, compression, and generative capabilities.
6. Limitations and Future Research Directions
For scientific PDE surrogate modeling, current limitations include suboptimal 3D performance relative to domain-specific neural operators (e.g., FNO), and the opaque nature of the alignment induced by PIRL—there are no hard physics constraints, and interpretation of learned reward alignment remains open. Future work is suggested in extending 3D efficacy and leveraging the architecture for inverse design, parameter estimation, and novel PDE term discovery through large-scale, self-supervised dataset mining (Chen et al., 2024).
For multimodal omninet architectures, trade-offs persist between parameter sharing and modality conflict, with per-modality heads or codebooks required to prevent performance loss. The area remains active, with further research needed on scaling, alignment without explicit paired supervision, and deeper integration of continuous and discrete modalities for both perception and generation tasks (Lau et al., 27 Feb 2026, Team, 5 Jan 2026).