Foundational Models Overview

Updated 6 January 2026

Foundation Models are large-scale Transformer-based neural networks pre-trained on diverse data to perform a wide range of tasks without task-specific architectures.
They leverage power-law scaling and emergent capabilities, including in-context and zero-shot learning, to deliver cross-domain applications in language, vision, and science.
Their deployment raises ethical challenges such as bias amplification, privacy risks, and environmental costs, necessitating robust evaluation and collaborative governance.

A foundation model (FM) is a large-scale neural network—typically Transformer-based—pretrained on a broad, heterogeneous corpus using self-supervised or unsupervised objectives, and subsequently adapted or prompted to solve a diverse array of downstream tasks without bespoke per-task architectures. Distinguished by power-law scaling in both model and dataset size, broad generalization, and the emergence of in-context learning and zero-shot capabilities, FMs have become the substrate for domains as varied as language, vision, biomedicine, and finance. Their proliferation has driven homogenization within the AI ecosystem, new socio-technical challenges, and a paradigm shift in both engineering methodology and scientific discovery (Schneider, 2022, Fu et al., 2024, Liu et al., 17 Oct 2025).

1. Definition, Foundations, and Distinction from Prior Deep Learning

The term “foundation model” refers to a neural network $f_\theta$ trained at unprecedented scale using self-supervised or unsupervised losses (e.g., masked language modeling, contrastive objectives) across massive, heterogeneous datasets. Formally, an FM is the output of a “compilation” process:

$\mathsf{FM} = \mathsf{Train}(\mathcal{D},\,\mathcal{A})$

where $\mathcal{D}$ denotes broad-ranging data and $\mathcal{A}$ specifies the neural architecture (often Transformer variants) (Ran et al., 2024). The core distinctive traits are:

Scale: Parameters $P$ and data $D$ in the billions–trillions, e.g., GPT-3 (175B params, 300B tokens).
Broad pretraining objectives: Generic, not targeted to a single downstream task.
Adaptability: Single set of weights reused for a spectrum of applications via in-context learning, prompting, or lightweight fine-tuning.

Unlike earlier deep models (e.g., BERT, ResNet) built for specific modalities or tasks, FMs reveal qualitatively new emergent properties—chain-of-thought reasoning, zero/few-shot learning—once scaling thresholds are crossed (Schneider, 2022, Fu et al., 2024).

2. Technical Architectures and Scaling Laws

FMs generally rely on Transformer architectures with variations:

Autoregressive (decoder-only): GPT-series (next-token prediction).
Masked (encoder-only): BERT (bidirectional attention, masked prediction).
Encoder–decoder (Seq2Seq): T5, BART (combined encoding/decoding).

Scaling laws empirically demonstrate held-out loss decays as a power law:

$L(P, D) - L_\infty \propto P^{-\alpha_P} + D^{-\alpha_D}$

$\mathrm{Perf} \propto N^\beta, \quad N = \text{#params or #samples},\, \beta\sim 0.1\!-\!0.3$

This generalizes across NLP, vision, multimodality, and even time-series (Schneider, 2022, Fu et al., 2024, Chen et al., 7 Jul 2025).

3. Emergent and Cross-Domain Capabilities

A hallmark of FMs is emergent behavior—capabilities not evident at small scales, including:

In-context learning: Performing unseen tasks from only a few prompt examples, without further parameter updates.
Zero/few-shot generalization: Adapting to novel domains with minimal new supervision.
Chain-of-thought reasoning and multi-agent intelligence: Partial theory-of-mind (ToM), planning, coordination, and social reasoning, though recent studies show that multi-agent intelligence does not arise directly from single-agent scaling and must be directly targeted in data and training (Hu et al., 9 Dec 2025).
Cross-modal unification: Embedding language, images, biological, financial, and temporal data in a shared representation, enabling tasks such as image–text retrieval, multimodal report generation, medical diagnosis, and financial risk assessment (Chen et al., 7 Jul 2025, Ghamizi et al., 16 Jun 2025, Baradwaj et al., 2024).
Universal backbones: FMs can serve as starting points for diverse fine-tuned or PEFT-adapted (Parameter-Efficient Fine-Tuning) models, dramatically reducing labeled data requirements (Zhou et al., 30 Jun 2025, Pai et al., 15 Jan 2025, Kang et al., 2023).

4. Ethical, Trustworthiness, and Societal Considerations

FM development and deployment pose unique ethical and socio-technical challenges:

Bias and fairness: Systematic risk of amplifying biases from pretraining corpora; documented in clinical FMs (e.g., predicting care cost as severity, propagating race-based medical myths) and financial FMs (risk of hallucinations, factual errors) (Baradwaj et al., 2024, Chen et al., 7 Jul 2025).
Privacy and security: Overparameterized models can memorize or leak sensitive details (e.g., membership inference attacks, patient re-identification); mitigation includes dataset de-duplication, differential privacy, secure computation, and federated protocols (Baradwaj et al., 2024, Kang et al., 2023).
Interpretability: The opacity of large Transformer-based FMs undermines accountability; current explainability tools (e.g., SHAP, LIME) are only partial solutions (Baradwaj et al., 2024, Fu et al., 2024).
Power concentration and homogenization: Control of foundational infrastructures by a handful of organizations shifts the AI capability frontier and risks monoculture (Schneider, 2022).
Environmental and financial costs: Training FMs at extreme scale requires consumption of significant compute and energy, raising sustainability barriers (Schneider, 2022).

Best practices across sectors call for transparency reports, bias audits, stakeholder-in-the-loop co-design, and robust, harmonized regulatory standards (Baradwaj et al., 2024).

5. Engineering, Evaluation, and Adaptation Methodologies

FM engineering has evolved to avoid a projected “FM crisis” of escalating complexity (Ran et al., 2024). Key developments include:

Declarative and unified interfaces: Model and data are treated as modular source code, with training as the compilation step; engineering best practices from software engineering are applied to manage complexity, versioning, and compliance at scale.
Automated data/model pipelines: Weak supervision, automated data cleaning, and hyperparameter tuning integrate with APIs and workflow DSLs for streamlined development.
Adapter-based scaling and dynamic multi-tier caching: Decouple base-model from task adapters for efficient serving of thousands of variants within a unified infrastructure.
Evaluation: Unified accuracy metrics for diverse tasks (e.g., brain imaging, multi-agent coordination), benchmark fragmentation and lack of standardization remain open problems (Ghamizi et al., 16 Jun 2025, Hu et al., 9 Dec 2025).
Federated and privacy-preserving adaptation: Federated Transfer Learning (FTL) enables adaptation while respecting private data and model ownership constraints; taxonomies detail what/when/how to transfer, attack/defend, and optimize in distributed settings (Kang et al., 2023).
Parameter-efficient adaptation and continual learning: LoRA, BitFit, prompt tuning, and mixture-of-expert systems reduce compute and support federated or collaborative model co-evolution.

6. Applications Across Domains

FMs underpin advancements in a wide range of technical and industrial domains:

Domain	FM Class/Example	Key Applications
Language	GPT-4, Llama-3, FinBERT, BloombergGPT	QA, summarization, compliance
Vision	ViT, CLIP, MedSAM, DINOv2	Image recognition, segmentation
Medicine	MerMED-FM, RETFound, CT-FM	Multimodal screening, triage, VQA
Finance	MarketGPT, FinLLaVA, TimesFM	Risk assessment, forecasting
Robotics	OpenVLA, π₀, LLM-agent+multimodal modules	Instruction grounding, manipulation
Brain Imaging	Med3D, M3AE, Med-VLP	Segmentation, VQA, diagnosis
Science	GPT-4+AlphaFold, FunSearch, Logic-LM	Hypothesis generation, experiments
Human Motion	MoFM	Semantic action recognition

Concrete metrics (e.g., AUROC, Dice, macro accuracy) and benchmarks are used to compare FMs against domain-specific baselines (Zhou et al., 30 Jun 2025, Pai et al., 15 Jan 2025, Chen et al., 7 Jul 2025, Baharani et al., 8 Feb 2025).

7. Limitations, Risks, and Future Directions

Despite their utility, FMs encounter multiple challenges:

Persistent gaps: Single-agent prowess does not automatically generalize to robust multi-agent intelligence; coordinated data collection, training schemes, and evaluation protocols are required (Hu et al., 9 Dec 2025).
Benchmarks and adaptation bottlenecks: Standardized suites and evaluation metrics for cross-domain, multimodal, and interactive agent scenarios are lacking (Ghamizi et al., 16 Jun 2025, Hu et al., 9 Dec 2025).
Biases and data imbalance: Underrepresented populations and modalities (e.g., in brain imaging or gastroenterology) limit equitable performance (Ghamizi et al., 16 Jun 2025, Kerdegari et al., 2024).
Scientific paradigm shift: FMs catalyze a three-stage evolution in scientific discovery: meta-scientific integration, human–AI co-creation, and ultimately, autonomous epistemic agents. Risks of bias amplification, reproducibility breakdown, and authorship ambiguity are amplified at higher autonomy(Liu et al., 17 Oct 2025).
Theoretical frontiers: Generalization, expressivity, and dynamics are being rigorously characterized; scaling laws, phase transitions in emergence, prompt engineering theory, and self-consuming loops are vital research lines (Fu et al., 2024).

Consensus best practices recommend early multi-agent curriculum integration, modular/population-based architectures, broad and diverse benchmarking, transparency, and collaborative community governance to ensure robust, equitable, and sustainable FM ecosystems (Hu et al., 9 Dec 2025).

References:

(Schneider, 2022) Foundation models in brief: A historical, socio-technical focus
(Ran et al., 2024) Foundation Model Engineering: Engineering Foundation Models Just as Engineering Software
(Baradwaj et al., 2024) Building an Ethical and Trustworthy Biomedical AI Ecosystem for the Translational and Clinical Integration of Foundational Models
(Pai et al., 15 Jan 2025) Vision Foundation Models for Computed Tomography
(Baharani et al., 8 Feb 2025) MoFM: A Large-Scale Human Motion Foundation Model
(Ghamizi et al., 16 Jun 2025) Brain Imaging Foundation Models, Are We There Yet?
(Zhou et al., 30 Jun 2025) Multimodal, Multi-Disease Medical Imaging Foundation Model (MerMED-FM)
(Chen et al., 7 Jul 2025) Advancing Financial Engineering with Foundation Models
(Liu et al., 17 Oct 2025) Foundation Models for Scientific Discovery: From Paradigm Enhancement to Paradigm Transition
(Hu et al., 9 Dec 2025) Towards Foundation Models with Native Multi-Agent Intelligence
(Kang et al., 2023) Grounding Foundation Models through Federated Transfer Learning: A General Framework
(Fu et al., 2024) A Theoretical Survey on Foundation Models