Polynomial Padding: Theory & Applications

Updated 23 March 2026

Polynomial padding is the augmentation of polynomials by adding extra terms, enabling uniform embeddings and degree alignment in various algebraic and computational settings.
It plays a crucial role in applications such as fast polynomial multiplication, complexity comparisons between determinants and permanents, and expanding the expressive power of transformer architectures.
The technique underpins geometric and algebraic structures through invariant varieties and specific module decompositions, serving as a foundation for both theoretical insights and practical algorithms.

Polynomial padding refers to the process of augmenting polynomials or polynomial-based constructions by introducing additional terms—typically as powers of a linear form or a special variable—in order to serve specific algebraic, computational, or structural goals. The technique manifests prominently in algebraic complexity theory, symbolic computation (e.g., fast polynomial multiplication), and neural network architectures leveraging sequence padding for enhanced expressive power. The core idea is to embed or transform given polynomials into higher-dimensional or higher-degree settings, thereby enabling uniformity, facilitating comparisons, or enabling structural decompositions that would otherwise be impossible or unwieldy.

1. Algebraic Definitions and Varieties of Padded Polynomials

Let $V$ be a complex vector space of dimension $w$ , and consider homogeneous polynomials of degree $d$ on $V$ , denoted $\operatorname{Sym}^d V^*$ . A polynomial $P \in \operatorname{Sym}^d V^*$ is called $k$ -padded if there exists a nonzero linear form $L\in V^*$ and a homogeneous $Q\in\operatorname{Sym}^{m}V^*$ , $m=d-k$ , such that

$P(x) = (L(x))^k Q(x).$

This expresses $P$ as divisible by $L^k$ , and the set of $k$ -padded polynomials, denoted $V_{k,d}$ , forms a $\mathrm{GL}(V)$ -invariant, irreducible, Zariski-closed subvariety of projective space $\mathbb{P}(\operatorname{Sym}^d V^*)$ .

The defining equations of $V_{k,d}$ are quadratic: they are cut out set-theoretically by explicit $\mathrm{GL}(V)$ -module decompositions into Schur functors $S_{(2(k-j),\,2(m+j))} V$ for $j=1,\ldots,k$ . The full homogeneous ideal $I(V_{k,d})$ in degree $\delta$ coincides with the kernel of a generalized Foulkes–Howe map:

$F^{(\delta)}_{k,d}: \operatorname{Sym}^\delta (\operatorname{Sym}^d V) \to \operatorname{Sym}^{\delta(d-k)} V \otimes \operatorname{Sym}^{\delta k} V.$

The coordinate ring of the normalization of $V_{k,d}$ is multiplicity-free and given as

$\widetilde{R} = \bigoplus_{\delta\ge 0} \operatorname{Sym}^{\delta(d-k)} V \otimes \operatorname{Sym}^{\delta k} V,$

which can be seen as the coordinate ring of the Segre–Veronese embedding of $\mathbb{P}(V^*) \times \mathbb{P}(V^*)$ with bidegree $(d-k,k)$ (Kadish et al., 2012).

2. Polynomial Padding in Algebraic Complexity Theory

Polynomial padding is central to the comparison between the determinant and the permanent within the framework of algebraic complexity theory. Specifically, in Valiant’s approach to establishing that the determinant does not efficiently simulate the permanent, one considers the size- $m$ permanent:

$\operatorname{Perm}_m(x_{ij}) = \sum_{\sigma\in S_m} \prod_{i=1}^m x_{i, \sigma(i)}$

and embeds it as a specialization of the $n \times n$ determinant by padding:

$Y = \mathrm{diag}(x_{11}, x_{12}, \ldots, x_{mm}, z, \ldots, z)$

(where $n^2-m^2$ entries are set to $z$ ), resulting in

$z^{n-m} \operatorname{Perm}_m(x) = \operatorname{Det}_n(Y).$

The factor $z^{n-m}$ is the padding, which raises $\deg \operatorname{Perm}_m$ from $m$ to $n$ , aligning it with $\deg \operatorname{Det}_n = n$ . The determinantal complexity of $\operatorname{Perm}_m$ is then defined via the minimal $n$ admitting such a specialization, and padding is essential in the classical approach (Gesmundo et al., 2017).

3. No-Go Theorems and the Limits of Flattening Techniques

Flattening-based lower-bound techniques, notably shifted partial derivatives, extend classical methods by considering various partial differentiation and multiplication patterns to distinguish polynomials such as the permanent and determinant. However, Efremenko–Landsberg–Schenck–Weyman proved a "no-go" theorem: for all sufficiently large $n$ ,

$\dim \langle \partial^{=e}(z^{n-m}\operatorname{Perm}_m) \rangle_{=\tau} \leq \dim \langle \partial^{=e}\operatorname{Det}_n\rangle_{=\tau},$

for every choice of $(e, \tau)$ . Hence, shifted partials cannot separate the padded permanent from the determinant once $n$ exceeds a moderate polynomial in $m$ . Mulmuley asked whether this barrier could be avoided if padding were eliminated. Gesmundo–Landsberg showed that even in the natural, unpadded model—comparing $\operatorname{Perm}_m$ to the iterated matrix multiplication (IMM) polynomial $\operatorname{IMM}^d_n$ (which is $VP_s$ -complete and does not require padding variables)—shifted partials still cannot prove superpolynomial lower bounds. Specifically, for all $n > m^5$ and all $e, \tau$ :

$\dim \langle \partial^{=e}\operatorname{Perm}_m\rangle_{=\tau} \leq \dim \langle \partial^{=e} \operatorname{IMM}^m_n \rangle_{=\tau}$

(Gesmundo et al., 2017). This demonstrates that padding is not an artifact of the method but rather reflects a deeper limitation.

4. Padding in Fast Polynomial Arithmetic

Outside of algebraic complexity, polynomial padding arises as a standard tool in the implementation of fast polynomial multiplication—particularly when employing FFT/NTT-based algorithms. Suppose $a(x), b(x) \in \mathbb{Z}_q[x]/(x^n+1)$ , $n$ a power of two. To compute their product without modular wraparound interfering with correct coefficient calculation, inputs are zero-padded from length $n$ to $N = 2n$ :

$a_{\text{pad}}(x) = \sum_{i=0}^{n-1} a_i x^i, \quad a'_{i} = 0 \text{ for } n \le i < 2n,$

and similarly for $b$ . The circular convolution of these zero-padded vectors via an $N$ -point NTT yields the correct full product, which is then folded back modulo $x^n+1$ . Zero-padding thus ensures no overlap between high- and low-degree coefficients:

$p(x) = \sum_{i=0}^{n-1} (c'_i - c'_{i+n}) x^i \mod q,$

where $c'_i$ are obtained after the inverse NTT. Alternatives such as negative wrapped convolution (NWC) are more efficient in certain hardware settings, but zero-padding remains attractive for uniform transform handling and implementation simplicity (Chiu et al., 2023).

5. Padding and Expressive Power in Transformer Architectures

In neural sequence models, particularly transformers, polynomial padding refers to the augmentation of an input sequence $w \in \Sigma^n$ with $P(n) = n^k$ "blank" (padding) tokens, creating $w' = w\Vert \square^{P(n)}$ . For averaging-hard-attention, masked-pre-norm transformers, allowing polynomial-size padding tokens at inference time (with fixed network depth) precisely expands the model's expressive power to the FO-uniform $\mathsf{TC}^0$ class—the set of problems computable by uniform constant-depth threshold circuits. This upper bound is tight: such padded transformers can simulate the FO[ $\mathsf{M}^2$ ] logic that characterizes $\mathsf{TC}^0$ . Furthermore, coupling polynomial padding with polylogarithmic-depth looping recovers exactly the hierarchy $\mathsf{TC}^d$ (and, in the limit, the class $\mathsf{NC}$ ) (Merrill et al., 25 May 2025).

Theoretical implications include the ability to embed classical reductions and completeness arguments from circuit complexity inside transformers; for instance, every FO-reduction can be represented and computed via sequence padding mechanisms. Padding and looping thus provide a throughput-parallel alternative to sequential "chain-of-thought" reasoning, without loss of parallelism.

6. Koszul Flattenings and Barriers Beyond Padding

Beyond flattenings induced by (shifted) partials, Koszul flattenings have been introduced to surpass the partial-derivative barrier, at least additively, for explicit families of polynomials. For certain odd-degree analogues, the Koszul flattening technique yields stronger lower bounds for symmetric border rank than obtainable from partials:

$\rank\left((\#_1 f_{n,k})_{k, k+1}^{\wedge q}\right) \geq \binom{n-1}{q}\left(\binom{n+k-1}{k}+q-1\right),$

strictly exceeding the maximum from ordinary partial derivatives. However, these improvements remain modest and reinforce the conclusion that eliminating padding is insufficient for fundamentally breaking the shifted-partials barrier: new group-equivariant or non-flattening methodologies are required for further progress on separating complexity classes such as $VP$ and $VNP$ (Gesmundo et al., 2017).

7. Applications and Classical Examples

Polynomial padding appears in a variety of mathematical and algorithmic contexts:

Low-dimensional algebraic geometry: The relation between double root loci (binary quartics with repeated roots) and paddings, or projective geometry embeddings via Segre–Veronese maps (Kadish et al., 2012).
Fast modular multiplication in cryptography: Efficient NTT-based algorithms for homomorphic encryption or lattice-based cryptosystems rely foundationally on zero-padded convolution (Chiu et al., 2023).

A summary table of core uses follows:

Context	Purpose of Padding	Reference
Permanent-det Comparison	Degree-raising, specialization	(Gesmundo et al., 2017)
Fast Polynomial Multiplication	Prevent cyclic convolution aliasing	(Chiu et al., 2023)
Transformer Expressive Power	Parallelization, width expansion	(Merrill et al., 25 May 2025)
Geometric Complexity Theory	Defining $V_{k,d}$ , variety structure	(Kadish et al., 2012)

Each instance exploits structural advantages unique to the introduced padding: degree alignment, elimination of modular artifacts, enhanced representational expressivity, or tractable geometric locus characterization.

In summary, polynomial padding is a foundational operation at the interface of algebra, complexity theory, symbolic algorithms, and modern machine-learning systems; its role is both technical and conceptual, enabling uniform embeddings, variety definitions, and structural simulations that would otherwise be inaccessible. Results over the past decade have rigorously delimited its advantages and limitations, highlighting the necessity of fundamentally novel techniques to overcome related lower-bound and expressivity barriers.

Markdown Report Issue Upgrade to Chat

References (4)

Padded polynomials, their cousins, and geometric complexity theory (2012)

Explicit polynomial sequences with maximal spaces of partial derivatives and a question of K. Mulmuley (2017)

Long Polynomial Modular Multiplication using Low-Complexity Number Theoretic Transform (2023)

Exact Expressive Power of Transformers with Padding (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Polynomial Padding.

Polynomial Padding: Theory & Applications

1. Algebraic Definitions and Varieties of Padded Polynomials

2. Polynomial Padding in Algebraic Complexity Theory

3. No-Go Theorems and the Limits of Flattening Techniques

4. Padding in Fast Polynomial Arithmetic

5. Padding and Expressive Power in Transformer Architectures

6. Koszul Flattenings and Barriers Beyond Padding

7. Applications and Classical Examples

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Polynomial Padding: Theory & Applications

1. Algebraic Definitions and Varieties of Padded Polynomials

2. Polynomial Padding in Algebraic Complexity Theory

3. No-Go Theorems and the Limits of Flattening Techniques

4. Padding in Fast Polynomial Arithmetic

5. Padding and Expressive Power in Transformer Architectures

6. Koszul Flattenings and Barriers Beyond Padding

7. Applications and Classical Examples

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research