Threshold characterization of distribution shift for in-context learning robustness in Transformers
Derive and prove a threshold condition on covariate shift severity under which Transformer models retain in-context learning capability, without relying on the Neural Tangent Kernel framework, by adapting the covariate-shift analyses of Pathak et al. (2022) and Ma et al. (2023) to the in-context learning setting for kernel-regressor views of large language models.
References
Without using the NTK framework, there is a recent work which studied in-context learning of LLMs in the context of kernel regressors, we conjecture that the theoretical analysis used in can be utilized for obtaining the threshold limit.
— A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models
(2401.07187 - Suh et al., 14 Jan 2024) in Section 6, Distribution Shift and Robustness (Q2)