Existence of a network achieving Bayes-optimal α^{-1} generalization while memorizing facts
Determine whether there exists a neural network architecture that simultaneously achieves the Bayes-optimal α^{-1} generalization rate on the teacher-rule task of the Rules-and-Facts (RAF) model and memorizes the random factual labels; if so, construct or rigorously analyze such an architecture (for example, a wide two-layer network with a trainable first layer) to establish this rate on RAF data.
References
This opens the following intriguing question: Does there exist a neural network that is able to reach a generalization rate of α{-1} on data drawn from the RAF model and at the same time memorize the facts? Our work indicates that linear and kernel methods are insufficient for that purpose. It is possible that a wide two-layer neural network with a trainable first layer (as opposed to fixed features, as we considered in this work) will achieve this goal. However, an analysis of such a neural network learning on the RAF data remains a technically open problem, which we leave for future investigation.