Maximal Subword Complexity

Updated 6 February 2026

Maximal subword complexity is a measure that quantifies the maximal count of distinct contiguous subwords (factors) in a finite or infinite sequence over a finite alphabet.
It is computed using the subword complexity function p(n), providing insights into the combinatorial extremality of sequences, exemplified by de Bruijn words and m-sequences.
This concept influences various fields such as automata theory, cryptography, and sequence design by revealing links to extremal properties and nondeterministic automatic complexity.

Maximal subword complexity quantifies, for a finite or infinite word over a finite alphabet, the maximal possible number of distinct contiguous subwords (also called factors or blocks) of given length. This notion captures the combinatorial extremality of certain sequences, providing a central tool in the analysis of sequences with high combinatorial, algorithmic, or pseudorandom structure. Maximal subword complexity is critical in fields such as combinatorics on words, finite automata theory, and sequence design for communication and cryptography.

1. Core Definitions and General Properties

Let $\mathcal{A}$ be a finite alphabet of cardinality %%%%1%%%%, and let $w \in \mathcal{A}^N$ be a finite word of length $N$ . The subword complexity function $p_w(n)$ counts the number of distinct contiguous factors of $w$ of length $n$ : $p_w(n) = |\{ w_{i}w_{i+1}\dots w_{i+n-1} : 1 \leq i \leq N-n+1 \} |, \quad 1 \leq n \leq N.$ The maximal complexity of a word $w$ is defined as

$C(w) = \max_{1 \leq n \leq |w|} p_w(n).$

The global maximal complexity for fixed $N$ is

$K(N) = \max_{w \in \mathcal{A}^N} C(w),$

and the set of block lengths at which this maximum is attained is denoted by

$R(N) = \left\{ n : 1 \leq n \leq N,\ \exists w \in \mathcal{A}^N,\ p_w(n)=K(N) \right\}.$

$M(N)$ denotes the number of words of length $N$ attaining $K(N)$ . For infinite words $\xi \in \mathcal{A}^\omega$ , the subword complexity at $n$ is $C_\xi(n) = |\{ u \in \mathcal{A}^n : u \text{ appears as a factor in } \xi \}|$ .

Maximal subword complexity is governed by the trivial combinatorial upper bound: $\min\{q^n,\,N\}$ for words of length $N$ , since at most $q^n$ words of length $n$ exist, and at most $N$ distinct position shifts fit in a word of length $N$ (Anisiu et al., 2010).

2. Extremal Sequences and Characterizing Maximal Complexity

De Bruijn–Martin Words: Extremal words achieving $K(N)$ are precisely prefixes of de Bruijn words (de Bruijn–Martin words). A de Bruijn word of order $m$ is a shortest word containing all $q^m$ length- $m$ factors exactly once. The extremal word construction proceeds as follows:

For $q^k + k \leq N \leq q^{k+1} + k$ , $K(N) = N - k$ is achieved at $n = k + 1$ .
Such $w$ is a prefix of a de Bruijn word of order $k+1$ .

m-Sequences (Maximal-Length LFSR Sequences): For binary words produced by a $k$ -stage linear feedback shift register (LFSR) with primitive characteristic polynomial (producing so-called m-sequences), maximal cyclic subword complexity is achieved: for period $n=2^k-1$ , $p_{x^\infty}(\ell)=2^\ell$ for $1 \leq \ell \leq k-1$ , and $p_{x^\infty}(\ell)=n$ for $k \leq \ell \leq n$ (Kjos-Hanssen, 2016). This reflects the property that the LFSR, over one period, generates all nonzero $k$ -tuples precisely once (Golomb’s sliding-window property).

Classes: De Bruijn words and (by extension) m-sequences represent sequences achieving the combinatorial upper bound for subword complexity; they play a foundational role in sequence design and extremal combinatorics (Anisiu et al., 2010, Kjos-Hanssen, 2016).

3. Exact Formulae, Enumeration, and Graph Structure

The enumeration and structural theory of maximal complexity are tightly connected to directed de Bruijn graphs. Vertices are $q^k$ possible length- $k$ words; edges correspond to possible extensions. An extremal word corresponds to a path (or Hamiltonian cycle for $N=q^k+k-1$ ) traversing each edge/vertex.

Key Results (Anisiu et al., 2010):

The maximal complexity $K(N)$ for $q^k + k \leq N \leq q^{k+1} + k$ is $K(N)=N-k$ , attained for $n = k+1$ .
For such $N$ $N$ :
- $M(N)$ = number of directed paths of length $N-k-1$ in $B(q,k+1)$ .
- For $N=q^k+k-1$ , $M(N)$ counts Hamiltonian cycles: for $q=2$ , $M(2^k+k-1) = 2^{2^{k-1}}$ .

This correspondence yields explicit counts for small $N$ , as shown in the table below for binary words with $N \leq 10$ :

N	K(N)	R(N)	M(N)
1	1	{1}	2
4	3	{2}	8
7	5	{3}	42
10	8	{3}	16

For general $q$ and $N$ , the enumeration of extremal words reduces to counting paths in the appropriate de Bruijn graph—a computation efficiently handled via adjacency matrix powers, but lacking a closed form in $N$ in the general case.

4. Maximal Complexity for Infinite and Structured Words

For infinite sequences, the subword complexity $C_\xi(n)$ is of primary interest.

Quasiperiodic Words: Given a fixed quasiperiod $q$ , the set of infinite quasiperiodic words has subword complexity characterized asymptotically as $C_\xi(n) = \Theta(\lambda_q^n)$ , where $\lambda_q$ is the unique largest positive root of a polynomial determined by the code structure derived from $q$ (Polley et al., 2010). The universal upper bound within this class is $t_P^n$ , $t_P \approx 1.324718$ , with $t_P$ solving $t^3 - t - 1 = 0$ .
Automatic Sequences and Arithmetical Subword Complexity: For $k$ -automatic sequences $a$ over alphabet $\Omega$ , the arithmetical subword complexity $p_a^{\mathrm{arith}}(\ell)$ counts length- $\ell$ words appearing along arbitrary arithmetic progressions. Maximal arithmetical subword complexity is defined by $p_a^{\mathrm{arith}}(\ell) = |\Omega|^\ell$ for all $\ell$ , and is classified via the effective alphabet size $r(a)$ : maximal complexity occurs if and only if $r(a) = |\Omega|$ (Konieczny et al., 2023).

Classes of automatic sequences exhibit a dichotomy:

Block-additive (e.g., Thue–Morse, digital sum modulo $m$ ) and certain nondegenerate cases achieve maximal arithmetical complexity.
Periodic and forward/backward synchronizing cases have polynomial (even subexponential) complexity, with $r(a)=1$ .

5. Implications for Automatic and Algorithmic Complexity

Maximal subword complexity directly influences the nondeterministic automatic complexity $A_N(x)$ , the minimal state count in an NFA with a unique accepting path consuming $x$ . Words with maximal subword complexity, such as m-sequences, force $A_N(x)\approx |x|/2 - O(\log^2 |x|)$ . This is asymptotically close to the worst-case for arbitrary words, as $A_N(y)\leq \lfloor n/2 \rfloor + 1$ for all words $y$ of length $n$ (Hyde’s result). Thus, m-sequences are nearly extremal for nondeterministic automatic complexity, and the combinatorial extremality is reflected at the automaton level (Kjos-Hanssen, 2016).

The study of maximal subword complexity connects with diverse themes:

Pseudorandomness and Linear Complexity: m-sequences are pseudorandom in the linear complexity sense while also being extremal for subword and automatic complexity (Kjos-Hanssen, 2016).
Generalizations: $q$ -ary analogues (over $\mathbb{F}_q$ ) realize the corresponding maxima, with $p(\ell)=q^\ell$ for $\ell < k$ , $p(\ell)=q^k-1$ for $\ell \geq k$ in the LFSR context.
Open Problems: The gap in the $O(\log^2 n)$ term for $A_N$ for m-sequences versus the universal upper bound remains open; counting exact numbers of extremal words for arbitrary $N$ and $q$ also remains unresolved (Anisiu et al., 2010, Kjos-Hanssen, 2016).

7. Representative Examples and Realizations

Explicit constructions and computations for small alphabets and periods exemplify maximal subword complexity:

For $k=3$ , $n=7$ , an m-sequence $x=0001011$ satisfies $p_{x^\infty}(1)=2$ , $p_{x^\infty}(2)=4$ , $p_{x^\infty}(3)=7$ , and $p_{x^\infty}(\ell)=7$ for $4 \leq \ell \leq 7$ , matching the combinatorial upper bounds.
For $k=4$ , $n=15$ , the sequence from $x^4 + x + 1$ achieves $p(1)=2, p(2)=4, p(3)=8, p(4)=15$ , $p(\ell)=15$ for $\ell \geq 4$ (Kjos-Hanssen, 2016).
Thue–Morse and similar automatic sequences realize maximal arithmetical subword complexity via Gowers uniformity and Fourier-theoretic arguments, ensuring every pattern appears along an arithmetic progression (Konieczny et al., 2023).

These examples illustrate the structural and combinatorial richness of maximal subword complexity and its deep interconnections across automata, combinatorics, and algebraic constructions.

Markdown Upgrade to Chat

References (4)

Maximal Complexity of Finite Words (2010)

Automatic complexity of shift register sequences (2016)

The Maximal Subword Complexity of Quasiperiodic Infinite Words (2010)

Arithmetical subword complexity of automatic sequences (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Maximal Subword Complexity.

Maximal Subword Complexity

1. Core Definitions and General Properties

2. Extremal Sequences and Characterizing Maximal Complexity

3. Exact Formulae, Enumeration, and Graph Structure

4. Maximal Complexity for Infinite and Structured Words

5. Implications for Automatic and Algorithmic Complexity

7. Representative Examples and Realizations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Maximal Subword Complexity

1. Core Definitions and General Properties

2. Extremal Sequences and Characterizing Maximal Complexity

3. Exact Formulae, Enumeration, and Graph Structure

4. Maximal Complexity for Infinite and Structured Words

5. Implications for Automatic and Algorithmic Complexity

6. Broader Context and Related Structures

7. Representative Examples and Realizations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research