Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Super-model ecosystem: A domain-adaptation perspective (2208.14092v1)

Published 30 Aug 2022 in cs.LG, cs.AI, and stat.ML

Abstract: This paper attempts to establish the theoretical foundation for the emerging super-model paradigm via domain adaptation, where one first trains a very large-scale model, {\it i.e.}, super model (or foundation model in some other papers), on a large amount of data and then adapts it to various specific domains. Super-model paradigms help reduce computational and data cost and carbon emission, which is critical to AI industry, especially enormous small and medium-sized enterprises. We model the super-model paradigm as a two-stage diffusion process: (1) in the pre-training stage, the model parameter diffuses from random initials and converges to a steady distribution; and (2) in the fine-tuning stage, the model parameter is transported to another steady distribution. Both training stages can be mathematically modeled by the Uhlenbeck-Ornstein process which converges to two Maxwell-Boltzmann distributions, respectively, each of which characterizes the corresponding convergent model. An $\mathcal O(1/\sqrt{N})$ generalization bound is then established via PAC-Bayesian framework. The theory finds that the generalization error of the fine-tuning stage is dominant in domain adaptation. In addition, our theory suggests that the generalization is determined by a new measure that characterizes the domain discrepancy between the source domain and target domain, based on the covariance matrices and the shift of the converged local minimum.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Fengxiang He (46 papers)
  2. Dacheng Tao (829 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.