An Optimal Sauer Lemma Over $k$-ary Alphabets

Published 14 Apr 2026 in cs.LG, math.CO, and stat.ML | (2604.12952v1)

Abstract: The Sauer-Shelah-Perles Lemma is a cornerstone of combinatorics and learning theory, bounding the size of a binary hypothesis class in terms of its Vapnik-Chervonenkis (VC) dimension. For classes of functions over a $k$-ary alphabet, namely the multiclass setting, the Natarajan dimension has long served as an analogue of VC dimension, yet the corresponding Sauer-type bounds are suboptimal for alphabet sizes $k>2$. In this work, we establish a sharp Sauer inequality for multiclass and list prediction. Our bound is expressed in terms of the Daniely--Shalev-Shwartz (DS) dimension, and more generally with its extension, the list-DS dimension -- the combinatorial parameters that characterize multiclass and list PAC learnability. Our bound is tight for every alphabet size $k$, list size $\ell$, and dimension value, replacing the exponential dependence on $\ell$ in the Natarajan-based bound by the optimal polynomial dependence, and improving the dependence on $k$ as well. Our proof uses the polynomial method. In contrast to the classical VC case, where several direct combinatorial proofs are known, we are not aware of any purely combinatorial proof in the DS setting. This motivates several directions for future research, which are discussed in the paper. As consequences, we obtain improved sample complexity upper bounds for list PAC learning and for uniform convergence of list predictors, sharpening the recent results of Charikar et al.~(STOC~2023), Hanneke et al.~(COLT~2024), and Brukhim et al.~(NeurIPS~2024).

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces an optimal Sauer Lemma for k-ary alphabets, establishing a tight upper bound via the DS dimension.
It employs the polynomial method to replace exponential dependencies on the list size with polynomial ones, enhancing sample complexity in multiclass and list learning.
The results deepen our understanding of multiclass hypothesis structures and highlight open questions regarding a purely combinatorial proof.

Optimal Sauer-Shelah-Perles Inequality for $k$ -ary Alphabets

Introduction and Motivation

The paper "An Optimal Sauer Lemma Over $k$ -ary Alphabets" (2604.12952) addresses the classical problem of bounding the size of hypothesis classes in learning theory, generalizing the celebrated Sauer-Shelah-Perles Lemma from the binary setting to multiclass scenarios, i.e., classes of functions $[k]^n$ . While the binary case is governed by the VC dimension, the multiclass setting lacks a precise analog, with prior bounds (notably those based on the Natarajan dimension) exhibiting suboptimal dependencies on alphabet and list sizes. The work establishes a tight combinatorial inequality for multiclass and list learning via the Daniely–Shalev-Shwartz (DS) dimension and its list extension, resolving longstanding deficiencies in the literature.

Classical and Multiclass Sauer Bounds

The classical Sauer-Shelah-Perles Lemma bounds the size of a binary hypothesis class $\mathcal{H} \subseteq \{0,1\}^n$ of VC dimension $d$ : $|\mathcal{H}| \leq \sum_{i=0}^{d} \binom{n}{i}$ This underpins PAC learnability and uniform convergence for binary classification. For multiclass settings, Natarajan's dimension was proposed, but its associated Sauer-type inequality

$|\mathcal{H}| \lesssim \ell^{n-d} n^d k^{(\ell+1)d}$

(where $\ell$ is a list size parameter) is only tight for $k=2$ , and fails to capture optimal growth for larger $k$ and $k$ 0. The dependence on $k$ 1 is worst-case exponential, and the dependence on $k$ 2 is unnecessarily pessimistic.

DS Dimension and Sharp Sauer-Type Bound

Recent advances show that the Daniely–Shalev-Shwartz (DS) dimension ( $k$ 3-DS dimension when list prediction is considered) precisely characterizes multiclass and list learnability. The paper introduces the notion of $k$ 4-pseudo-cubes, generalizing $k$ 5-cubes, and sets the $k$ 6-DS dimension as the maximal cardinality of subsets shattered by $k$ 7-pseudo-cubes. The main theorem establishes a tight upper bound: $k$ 8 where $k$ 9 is the $[k]^n$ 0-DS dimension. This result aligns with extremal constructions and is tight for all relevant parameters, correcting the exponential dependence in $[k]^n$ 1 from prior art to a polynomial, and optimizing also the $[k]^n$ 2 dependence.

Proof Technique and Structural Observations

The argument is rooted in the polynomial method, constructing suitable vector spaces and sets of indicator functions and monomials such that the dimension counting yields the claimed combinatorial bound. Notably, this proof is algebraic, diverging from the rich family of combinatorial proofs known for the binary Sauer Lemma. The lack of a purely combinatorial proof in the DS setting is identified as an important open problem, with implications for sample complexity in multiclass PAC learning.

The paper also details connections between pseudo-cubes, bipartite graphs, and classical Turán-type extremal problems, particularly highlighting gaps in existing Natarajan-based bounds through explicit examples.

Applications: PAC Learning and Uniform Convergence

The sharpened DS-based combinatorial inequality yields strong quantitative consequences in learning theory:

List PAC Learning: The sample complexity of $[k]^n$ 3-list PAC learning for concept classes $[k]^n$ 4 of finite $[k]^n$ 5-DS dimension $[k]^n$ 6 is improved to

$[k]^n$ 7

This removes any dependence on the alphabet size $[k]^n$ 8, and polynomializes the list size $[k]^n$ 9. Previous bounds, e.g., those based on Natarajan dimension, scaled as $\mathcal{H} \subseteq \{0,1\}^n$ 0 [charikar2023characterization], and $\mathcal{H} \subseteq \{0,1\}^n$ 1 [brukhim2024multiclass], so this represents a significant improvement.

List Uniform Convergence: The sample complexity for uniform convergence in classes of $\mathcal{H} \subseteq \{0,1\}^n$ 2-list predictors with $\mathcal{H} \subseteq \{0,1\}^n$ 3-DS dimension $\mathcal{H} \subseteq \{0,1\}^n$ 4 is improved to

$\mathcal{H} \subseteq \{0,1\}^n$ 5

Again, prior work incurred quadratic dependence on $\mathcal{H} \subseteq \{0,1\}^n$ 6.

These results sharpen the learning-theoretic guarantees for multiclass and list learning, and are consequential for applications such as recommendation systems and top- $\mathcal{H} \subseteq \{0,1\}^n$ 7 loss classification, where outputting lists of candidates is essential.

Combinatorial and Algebraic Open Directions

The paper explores relationships between the Natarajan and DS dimensions, maximum classes for DS dimension, and the persistence of structural richness analogous to VC-maximum classes. The authors emphasize the import of developing combinatorial proofs for the DS Sauer Lemma, which may lead to further reductions in sample complexity, and deeper understanding of extremal multiclass classes.

Additionally, connections with algebraic approaches to Sauer-type bounds for various combinatorial dimensions (such as Recursive Teaching dimension and Graph dimension) are discussed, suggesting avenues for unification and deeper analysis.

Implications and Future Perspectives

The optimal Sauer-type bound for $\mathcal{H} \subseteq \{0,1\}^n$ 8-ary alphabets via the DS dimension enables:

Tighter theoretical analysis for multiclass PAC and list learning, directly influencing statistical learning theory and practical algorithm design.
Reduction in sample complexity for high-cardinality multiclass and top- $\mathcal{H} \subseteq \{0,1\}^n$ 9 settings, removing unnecessary exponential penalties and mitigating the curse of dimensionality.
Enhanced understanding of extremal hypothesis class structure, which may inform approaches to sample compression, boosting, and data-dependent learning.
Potential further improvements contingent on combinatorial insights, particularly for closing the gap between upper and lower sample complexity bounds in multiclass PAC learning.

The polynomial method's success here reaffirms the efficacy of algebraic combinatorics in learning theory, but the pursuit of combinatorial proof techniques remains both theoretically significant and practically promising.

Conclusion

This work fundamentally refines and optimizes Sauer-type bounds for multiclass and list prediction problems, transitioning from the Natarajan dimension to the DS dimension as the governing combinatorial parameter. The resulting inequality is tight, yields improved sample complexity and uniform convergence rates, and has broad ramifications for both machine learning and extremal combinatorics. The absence of an explicit combinatorial proof in the DS setting, and the structural properties of DS-maximum classes, are highlighted as compelling directions for further research.

Markdown Report Issue