OmniArch: Unified Foundation Model

Updated 25 March 2026

OmniArch is a unified deep foundation model that integrates scientific computing for PDEs with dense multimodal representation learning.
It employs an axis-based encoder and a decoder-only Transformer to achieve high parameter efficiency and robust zero-shot generalization across 1D–3D domains.
Physics-Informed Reinforcement Learning fine-tunes outputs via natural language PDE captions, enabling interactive solver steering and scalable deployment.

OmniArch refers to a distinct line of research and development in deep foundation model architectures, with principal deployments in two technical domains: (1) scientific machine learning for partial differential equations (PDEs) and (2) dense multimodal representation learning. In both cases, the defining feature is the pursuit of unified architectures capable of simultaneously modeling “omni-domain” inputs—whether spatial/temporal/physical (for scientific computing) or image/audio/text (for modality fusion)—with high parameter efficiency and robust out-of-distribution generalization.

1. Core Architecture for Scientific Computing

OmniArch, as introduced in "OmniArch: Building Foundation Model For Scientific Computing," presents a unified framework to address multi-scale and multi-physics PDE problems, focusing on a physically-aligned, scalable neural operator that consolidates training and inference across 1D, 2D, and 3D spatial domains (Chen et al., 2024).

Encoder–Decoder and Tokenization

The principal innovation is an axis-based encoder $E$ mapping an $M$ -dimensional coordinate (space–time) to an $H$ -dimensional hidden state: $\mathbf{h}_{\mathbf{x},t} = E\bigl(\mathcal{F}(\mathbf{x},t)\bigr) \in \mathbb{R}^H$ where $\mathcal{F}(\mathbf{x},t) \in \mathbb{R}^C$ is the multi-channel field value. These hidden vectors are flattened (following xVal), producing a token sequence for a standard decoder-only Transformer. Channel-wise grouping is performed at each timestep; tokenization proceeds by concatenating $C$ channel tokens $\mathbf{T}_t = [\mathbf{h}_{1,t};…;\mathbf{h}_{C,t}]$ , which are then consumed by the backbone.

A learnable inverse $E^{-1}$ reconstructs fields at decode time: $\hat{\mathcal{F}}(\mathbf{x},t) = E^{-1}\bigl(\hat{\mathbf{h}}_{\mathbf{x},t}\bigr)$

The architecture does not incorporate classic Fourier neural operators or explicit mechanisms for “fading out disharmony across separated dimensions”; such mechanisms are not described in the source.

Transformer Backbone

OmniArch employs a decoder-only Transformer characterized by:

Hidden size: $d_\mathrm{model} = 1024$
Intermediate size: $4096$
Attention heads: 16
Normalization: Pre-norm with RMSNorm
Position encoding: Rotary position embeddings (RoPE)
Masking: Causal (shift-right)

Self-attention is implemented per standard: $\mathrm{Attention}(Q,K,V) = \mathrm{softmax}\!\left(\frac{QK^\top}{\sqrt{d_k}}\right)\,V$ No additional inductive biases for “physical alignment” are introduced in the attention layers.

Physics-Informed Fine-Tuning

The primary fine-tuning strategy is Physics-Informed Reinforcement Learning (PIRL), which diverges from classic PINN/PDE residual loss formulations. A CLIP-style scorer $S$ produces a scalar reward to align the model output with a natural-language PDE caption $\mathbf{t}$ : $r = S\bigl(\hat{\mathbf{x}}_{t+1}, \mathbf{t}\bigr)$ Fine-tuning maximizes expected reward: $\theta_G^* = \arg\max_{\theta_G} \mathbb{E}_{(\mathbf{X}_{0:t},\mathbf{t})\sim\mathcal{D}}\bigl[ S\bigl(G(\mathbf{X}_{0:t};\theta_G),\mathbf{t}\bigr) \bigr]$ There are no explicit physics-constraint or residual terms; all alignment is learned via the reward function.

2. Unified Training on PDEBench

The pre-training scheme pools a comprehensive set of PDE types into a single learning task—solving 1D, 2D, and 3D problems concurrently, as detailed below:

Dimension	PDE Classes	Train Traj.	Valid.	Test	$N_t$	$N_s$
1D	CFD, Reaction–Diffusion, Advection, Burgers, Diff.-Sorption	378k	42k	28	81	1024
2D	CFD, Diff–React, Navier–Stokes, Shallow Water	42.3k	300	8	21	128
3D	3D CFD	630	70	6	21	64

Out-of-domain evaluation includes shock tubes, Kelvin–Helmholtz, TOV, blastwave, and turbulence, establishing the model’s capacity for in-context and zero-shot generalization.

Training utilizes batch-wise normalized RMSE (nRMSE): $\mathrm{nRMSE}(B) = \frac{1}{|B|} \sqrt{\sum_{(\mathbf{x},t)\in B} \left(\frac{\mathcal{F}(\mathbf{x},t) - \hat{\mathcal{F}}(\mathbf{x},t)}{\sigma_{\mathcal{F}}}\right)^2}$ where $\sigma_{\mathcal{F}}$ is the batch-wise field standard deviation.

3. Performance Characteristics and Adaptability

Quantitative Results

Comparison against baselines demonstrates substantial improvement on 1D and 2D tasks:

1D_CFD: nRMSE reduced from 0.0981 (best baseline) to 0.0392 (OmniArch+PIRL, $-97.2\%$ )
Burgers: 0.0067 $\to$ 0.0032 ( $-81.9\%$ )
2D_CFD: 0.0994 $\to$ 0.0153 ( $-14.0\%$ )
3D_CFD: 0.6600 (OmniArch) vs. 0.3050 (FNO); current 3D performance lags the state-of-the-art.

Generalization

Zero-shot out-of-domain generalization on 2D flows is significant (nRMSE: 0.2126, 0.5432, 0.1718 on Shock, KH, OTVortex), surpassing FNO, UNet, and MPP baselines. Increasing prompt length improves performance without explicit time-feature input, signifying implicit temporal derivative modeling.

Dynamic prompt sizing allows a $10\times$ inference speed-up at moderate accuracy loss, supporting high-throughput applications.

4. Engineering and Scientific Impact

OmniArch’s fusion of 1D–3D PDE surrogate modeling facilitates rapid deployment in diverse settings including computational fluid dynamics (CFD) for aircraft, weather prediction, and semiconductor process simulation. PIRL-based alignment with textual descriptions creates opportunities for interactive solver steering, automated tuning, and scientific hypothesis generation via “learned heuristics.” The architecture’s zero-shot generalization capacity suggests emergent operator representations, laying groundwork for inverse design and automated PDE discovery through self-supervised exploration (Chen et al., 2024).

“OmniArch” (as a term) is sometimes conflated with other “omni” architectures in adjacent fields, particularly in modality fusion.

Omni-C (Lau et al., 27 Feb 2026) employs a single dense ViT backbone with small modality-specific MLP heads and single-modality contrastive losses. It achieves competitive unimodal and cross-modal retrieval/recognition with ≈45% reduction in parameter/memory footprint relative to separate expert encoders, and supports sequentially processed inference for edge devices.
HyperCLOVA X 8B Omni (Team, 5 Jan 2026) implements a 36-layer Transformer generating any-to-any text, audio, and vision outputs from a unified interleaved sequence—employing both discrete and continuous embeddings in a shared autoregressive backbone, and delivering leading performance on language, vision-text, and audio tasks.

OmniArch in scientific computing is distinguished by its physical modeling objectives and reinforcement-style physics alignment, whereas omnimodal models in representation learning target cross-modal data fusion, compression, and generative capabilities.

6. Limitations and Future Research Directions

For scientific PDE surrogate modeling, current limitations include suboptimal 3D performance relative to domain-specific neural operators (e.g., FNO), and the opaque nature of the alignment induced by PIRL—there are no hard physics constraints, and interpretation of learned reward alignment remains open. Future work is suggested in extending 3D efficacy and leveraging the architecture for inverse design, parameter estimation, and novel PDE term discovery through large-scale, self-supervised dataset mining (Chen et al., 2024).

For multimodal omninet architectures, trade-offs persist between parameter sharing and modality conflict, with per-modality heads or codebooks required to prevent performance loss. The area remains active, with further research needed on scaling, alignment without explicit paired supervision, and deeper integration of continuous and discrete modalities for both perception and generation tasks (Lau et al., 27 Feb 2026, Team, 5 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (3)

OmniArch: Building Foundation Model For Scientific Computing (2024)

Omni-C: Compressing Heterogeneous Modalities into a Single Dense Encoder (2026)

HyperCLOVA X 8B Omni (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to OmniArch.

OmniArch: Unified Foundation Model

1. Core Architecture for Scientific Computing

Encoder–Decoder and Tokenization

Transformer Backbone

Physics-Informed Fine-Tuning

2. Unified Training on PDEBench

3. Performance Characteristics and Adaptability

Quantitative Results

Generalization

4. Engineering and Scientific Impact

6. Limitations and Future Research Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

OmniArch: Unified Foundation Model

1. Core Architecture for Scientific Computing

Encoder–Decoder and Tokenization

Transformer Backbone

Physics-Informed Fine-Tuning

2. Unified Training on PDEBench

3. Performance Characteristics and Adaptability

Quantitative Results

Generalization

4. Engineering and Scientific Impact

5. Related Paradigms: Multimodal Omni-Architectures

6. Limitations and Future Research Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research