Universality of training trajectories in high-dimensional logistic regression
Investigate whether the optimization trajectories (e.g., gradient-descent learning curves and convergence behavior) for penalized logistic regression in the proportional high-dimensional regime are universal across input data distributions, including heavy-tailed and uniform cases, under the data augmentation settings analyzed in the paper. Specifically, ascertain whether differences in required learning rates and non-convergent behavior observed for certain distributions indicate a lack of universality in training trajectories despite the proven universality of global minima.
References
We find that for these three setups, ${\rm LR}=0.1$ does not lead to convergence within $105$ steps. We conjecture that this arises due to the lack of universality of the training trajectories, as illustrated in \Cref{fig:train:trajectory} and as discussed towards the end of \Cref{sec:DA}.