Linear-Size Neural Network Representation of Piecewise Affine Functions in $\mathbb{R}^2$

Published 17 Mar 2025 in cs.LG, cs.NE, math.MG, and stat.ML | (2503.13001v1)

Abstract: It is shown that any continuous piecewise affine (CPA) function $\mathbb{R}^{2\to\mathbb{R}$} with $p$ pieces can be represented by a ReLU neural network with two hidden layers and $O(p)$ neurons. Unlike prior work, which focused on convex pieces, this analysis considers CPA functions with connected but potentially non-convex pieces.

Abstract PDF Chat (Pro)

Summary

The paper proves that any continuous piecewise affine function on ℝ² with p pieces is exactly represented by a two-hidden-layer ReLU network using O(p) neurons.
The methodology uses a two-stage construction: a first hidden layer that implements gate-like ReLU structures and a second layer that linearly combines these outputs to capture region-specific affine mappings.
The work extends previous results limited to convex partitions, showing that even nonconvex regions can be efficiently modeled with a linear-size network architecture, offering optimal resource allocation.

Overview

The paper rigorously analyzes the representability of continuous piecewise affine (CPA) functions defined on ℝ² by deep neural networks with ReLU activations. In particular, the authors consider a CPA function that is partitioned into p connected regions—regions that may be non-convex—and they establish that such a function can be represented exactly by a ReLU network featuring only two hidden layers and O(p) neurons. This work extends previous results that were primarily limited to functions defined over convex regions, thereby broadening the scope of neural network expressivity in modeling more complex piecewise affine landscapes.

Methodology

The central methodology involves a constructive approach to the network design. Specifically, the authors derive a systematic way to partition the input space into p distinct regions where the CPA function has a different affine expression. The key technical innovation in this work is the careful handling of non-convex partitions. Unlike convex regions, non-convex regions can exhibit multiple segments in which the linear behavior may change, and accordingly, their representation involves nontrivial combining of ReLU activations.

The architecture is built in two stages:

First Hidden Layer: The first hidden layer implements gate-like structures using ReLU units to effectively “detect” the boundaries of each piece. This involves encoding conditions based on the linear inequalities that define the boundaries of the regions.
Second Hidden Layer: In the subsequent layer, the outputs of these gate functions are linearly combined in such a way that the affine expression corresponding to the active region is selected. The design leverages the compositional structure of ReLU activations to replicate the affine mappings within each partition.

A notable aspect of the construction is its linear scalability: the number of neurons required to represent any given CPA function scales linearly with the number of pieces p, i.e., O(p). This result is established through detailed combinatorial and geometric arguments on the partition properties of the input space.

Main Results and Contributions

The paper’s primary contribution is the demonstration that the representational capacity of basic two-layer ReLU networks is sufficient for exactly capturing any continuous piecewise affine function over ℝ² with p pieces, even when the pieces are non-convex. Key contributions include:

Constructive Upper Bound: The authors provide explicit constructions that yield networks with O(p) neurons. The construction is algorithmic, offering a practical method for translating a CPA function’s definition into a concrete neural network architecture.
Generalization to Non-Convex Pieces: Prior work focused mostly on convex partitions which simplify the combinatorial structure of the representation. This paper removes this restriction and develops techniques to manage non-convex regions, significantly expanding the class of functions that can be modeled by such networks.
Optimal Resource Allocation: Through a theoretical analysis, the paper reveals that the two-hidden-layer architecture is optimally resource-efficient in terms of neuron count relative to the complexity (number of pieces) of the function. This supports the paradigm that even shallow networks (with just two hidden layers) can capture high-complexity behaviors if designed appropriately.

The explicit network construction and analysis provide strong numerical guarantees regarding the network size. The O(p) scaling is particularly compelling as it implies that increasing the complexity of the function (by increasing p) leads to only a linear increase in the network size, rather than an exponential one.

Comparison with Prior Work

Previous studies in the field have often concentrated on cases where the regions on which the affine pieces are defined are convex. Convexity simplifies both the geometrical and the algebraic aspects of the representation; when the pieces are convex, the ReLU activations naturally lend themselves to demarcating linear regions through hyperplane arrangements.

In contrast, the present work addresses functions with regions that are connected but can be non-convex. This introduces additional technical challenges, as non-convex regions require partitioning strategies that cannot rely solely on the standard convex separation by hyperplanes. The paper provides a refined argument on how to decompose the non-convex regions into tractable segments managed by the network, showcasing that the difficulty introduced by non-convexity can nonetheless be counterbalanced by an appropriately designed two-hidden-layer architecture.

Furthermore, the paper’s emphasis on a linear-size representation (i.e., O(p) neurons) underlines a significant improvement over more general bounds that might otherwise be expected when representing arbitrarily complex partitioned functions. This efficient representation underlines the potential for practical implementations in geometric deep learning contexts and other applications where piecewise affine structures are prevalent.

Conclusion

The study provides a comprehensive framework for representing continuous piecewise affine functions in ℝ² with a tailored two-hidden-layer ReLU network. By successfully extending the theory to functions with non-convex partitions, the work not only generalizes previous results but also offers an efficient (linear-size) neural network architecture that exactly replicates the target function. This result is particularly significant for theoretical investigations into the expressive power of neural networks and has potential implications for practical applications in areas that require precise modeling of piecewise affine systems.