Parametrising the Inhomogeneity Inducing Capacity of a Training Set, and its Impact on Supervised Learning (2510.18332v1)

Published 21 Oct 2025 in stat.ML and cs.LG

Abstract: We introduce parametrisation of that property of the available training dataset, that necessitates an inhomogeneous correlation structure for the function that is learnt as a model of the relationship between the pair of variables, observations of which comprise the considered training data. We refer to a parametrisation of this property of a given training set, as its inhomogeneity parameter''. It is easy to compute this parameter for small-to-large datasets, and we demonstrate such computation on multiple publicly-available datasets, while also demonstrating that conventionalnon-stationarity'' of data does not imply a non-zero inhomogeneity parameter of the dataset. We prove that - within the probabilistic Gaussian Process-based learning approach - a training set with a non-zero inhomogeneity parameter renders it imperative, that the process that is invoked to model the sought function, be non-stationary. Following the learning of a real-world multivariate function with such a Process, quality and reliability of predictions at test inputs, are demonstrated to be affected by the inhomogeneity parameter of the training data.