How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks (2009.11848v5)

Published 24 Sep 2020 in cs.LG, cs.AI, cs.CV, and stat.ML

Abstract: We study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of the training distribution. Previous works report mixed empirical results when extrapolating with neural networks: while feedforward neural networks, a.k.a. multilayer perceptrons (MLPs), do not extrapolate well in certain simple tasks, Graph Neural Networks (GNNs) -- structured networks with MLP modules -- have shown some success in more complex tasks. Working towards a theoretical explanation, we identify conditions under which MLPs and GNNs extrapolate well. First, we quantify the observation that ReLU MLPs quickly converge to linear functions along any direction from the origin, which implies that ReLU MLPs do not extrapolate most nonlinear functions. But, they can provably learn a linear target function when the training distribution is sufficiently "diverse". Second, in connection to analyzing the successes and limitations of GNNs, these results suggest a hypothesis for which we provide theoretical and empirical evidence: the success of GNNs in extrapolating algorithmic tasks to new data (e.g., larger graphs or edge weights) relies on encoding task-specific non-linearities in the architecture or features. Our theoretical analysis builds on a connection of over-parameterized networks to the neural tangent kernel. Empirically, our theory holds across different training settings.

Citations (284)

View on Semantic Scholar

Summary

The paper demonstrates that feedforward networks rapidly converge to linear approximations but struggle with nonlinear extrapolation unless trained on diverse directions.
It shows that graph neural networks employ task-specific nonlinearities to effectively tackle algorithmic challenges like max degree and shortest path problems.
The study provides a roadmap for designing robust AI systems by integrating architectural insights with strategic training data geometry to improve extrapolation.

Extrapolation Capabilities of Neural Networks: Feedforward and Graph Neural Networks

In their work titled "How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks," the authors explore the extrapolation behavior of neural networks trained via gradient descent, focusing particularly on feedforward neural networks (MLPs) and Graph Neural Networks (GNNs). This paper is motivated by contrasting empirical results in previous works, where GNNs have displayed successful extrapolation in specific algorithmic tasks while MLPs have often failed to do so for simpler tasks.

The authors first examine the extrapolation behavior of MLPs. They establish that while MLPs can converge to linear approximations swiftly along any direction from the origin, they struggle to extrapolate nonlinear functions effectively. This is primarily due to the inherent linearity in the activation functions and the lack of capacity to represent complex nonlinear dynamics beyond the immediate vicinity of the training data. However, MLPs can reliably extrapolate linear functions if the training distribution is "diverse," meaning it covers all possible directions in the feature space.

The paper extends these insights to elucidate why GNNs manage to extrapolate effectively in certain complex tasks. The researchers observe that many algorithms solvable via dynamic programming (DP) — such as those involving graph calculations — align well with the operational structure of GNNs. They hypothesize that GNNs extrapolate successfully when the neural architecture encodes specific nonlinearities necessary for the task, thereby allowing MLP components within the GNN to handle linear sub-problems efficiently.

Empirical analysis supports these theoretical insights. Through rigorous experiments, the authors demonstrate situations where the choice of architecture (e.g., using max- or min-aggregations) enables extrapolation for GNNs in tasks like the max degree or shortest path problems. They also validate the role of training data diversity, showing that the geometry of the training set significantly affects the extrapolation accuracy.

The primary implication of this research is a roadmap for designing neural networks capable of extrapolation. For MLPs, this involves ensuring linear task components and sufficient directional coverage in training samples. For GNNs, it suggests incorporating task-specific nonlinearities directly within the architecture to facilitate the extrapolation capability of the neural components dealing with linear aspects.

This work has significant implications for extending deep learning applications to domains where extrapolation beyond the training distribution is crucial. Furthermore, it lays foundational work for the future development of more robust AI systems capable of performing well under dynamically changing environments or when confronted with previously unseen scenarios. Future AI frameworks might incorporate these principles to enhance their generalization capabilities, enabling more extensive applications of neural networks in real-world, complex problem-solving scenarios.

PDF Markdown

Related Papers

YouTube

Show All Videos