Papers
Topics
Authors
Recent
Search
2000 character limit reached

Maximal Subword Complexity

Updated 6 February 2026
  • Maximal subword complexity is a measure that quantifies the maximal count of distinct contiguous subwords (factors) in a finite or infinite sequence over a finite alphabet.
  • It is computed using the subword complexity function p(n), providing insights into the combinatorial extremality of sequences, exemplified by de Bruijn words and m-sequences.
  • This concept influences various fields such as automata theory, cryptography, and sequence design by revealing links to extremal properties and nondeterministic automatic complexity.

Maximal subword complexity quantifies, for a finite or infinite word over a finite alphabet, the maximal possible number of distinct contiguous subwords (also called factors or blocks) of given length. This notion captures the combinatorial extremality of certain sequences, providing a central tool in the analysis of sequences with high combinatorial, algorithmic, or pseudorandom structure. Maximal subword complexity is critical in fields such as combinatorics on words, finite automata theory, and sequence design for communication and cryptography.

1. Core Definitions and General Properties

Let A\mathcal{A} be a finite alphabet of cardinality q1q \geq 1, and let wANw \in \mathcal{A}^N be a finite word of length NN. The subword complexity function pw(n)p_w(n) counts the number of distinct contiguous factors of ww of length nn: pw(n)={wiwi+1wi+n1:1iNn+1},1nN.p_w(n) = |\{ w_{i}w_{i+1}\dots w_{i+n-1} : 1 \leq i \leq N-n+1 \} |, \quad 1 \leq n \leq N. The maximal complexity of a word ww is defined as

C(w)=max1nwpw(n).C(w) = \max_{1 \leq n \leq |w|} p_w(n).

The global maximal complexity for fixed q1q \geq 10 is

q1q \geq 11

and the set of block lengths at which this maximum is attained is denoted by

q1q \geq 12

q1q \geq 13 denotes the number of words of length q1q \geq 14 attaining q1q \geq 15. For infinite words q1q \geq 16, the subword complexity at q1q \geq 17 is q1q \geq 18.

Maximal subword complexity is governed by the trivial combinatorial upper bound: q1q \geq 19 for words of length wANw \in \mathcal{A}^N0, since at most wANw \in \mathcal{A}^N1 words of length wANw \in \mathcal{A}^N2 exist, and at most wANw \in \mathcal{A}^N3 distinct position shifts fit in a word of length wANw \in \mathcal{A}^N4 (Anisiu et al., 2010).

2. Extremal Sequences and Characterizing Maximal Complexity

De Bruijn–Martin Words: Extremal words achieving wANw \in \mathcal{A}^N5 are precisely prefixes of de Bruijn words (de Bruijn–Martin words). A de Bruijn word of order wANw \in \mathcal{A}^N6 is a shortest word containing all wANw \in \mathcal{A}^N7 length-wANw \in \mathcal{A}^N8 factors exactly once. The extremal word construction proceeds as follows:

  • For wANw \in \mathcal{A}^N9, NN0 is achieved at NN1.
  • Such NN2 is a prefix of a de Bruijn word of order NN3.

m-Sequences (Maximal-Length LFSR Sequences): For binary words produced by a NN4-stage linear feedback shift register (LFSR) with primitive characteristic polynomial (producing so-called m-sequences), maximal cyclic subword complexity is achieved: for period NN5, NN6 for NN7, and NN8 for NN9 (Kjos-Hanssen, 2016). This reflects the property that the LFSR, over one period, generates all nonzero pw(n)p_w(n)0-tuples precisely once (Golomb’s sliding-window property).

Classes: De Bruijn words and (by extension) m-sequences represent sequences achieving the combinatorial upper bound for subword complexity; they play a foundational role in sequence design and extremal combinatorics (Anisiu et al., 2010, Kjos-Hanssen, 2016).

3. Exact Formulae, Enumeration, and Graph Structure

The enumeration and structural theory of maximal complexity are tightly connected to directed de Bruijn graphs. Vertices are pw(n)p_w(n)1 possible length-pw(n)p_w(n)2 words; edges correspond to possible extensions. An extremal word corresponds to a path (or Hamiltonian cycle for pw(n)p_w(n)3) traversing each edge/vertex.

Key Results (Anisiu et al., 2010):

  • The maximal complexity pw(n)p_w(n)4 for pw(n)p_w(n)5 is pw(n)p_w(n)6, attained for pw(n)p_w(n)7.
  • For such pw(n)p_w(n)8:
    • pw(n)p_w(n)9 = number of directed paths of length ww0 in ww1.
    • For ww2, ww3 counts Hamiltonian cycles: for ww4, ww5.

This correspondence yields explicit counts for small ww6, as shown in the table below for binary words with ww7:

N K(N) R(N) M(N)
1 1 {1} 2
4 3 {2} 8
7 5 {3} 42
10 8 {3} 16

For general ww8 and ww9, the enumeration of extremal words reduces to counting paths in the appropriate de Bruijn graph—a computation efficiently handled via adjacency matrix powers, but lacking a closed form in nn0 in the general case.

4. Maximal Complexity for Infinite and Structured Words

For infinite sequences, the subword complexity nn1 is of primary interest.

  • Quasiperiodic Words: Given a fixed quasiperiod nn2, the set of infinite quasiperiodic words has subword complexity characterized asymptotically as nn3, where nn4 is the unique largest positive root of a polynomial determined by the code structure derived from nn5 (Polley et al., 2010). The universal upper bound within this class is nn6, nn7, with nn8 solving nn9.
  • Automatic Sequences and Arithmetical Subword Complexity: For pw(n)={wiwi+1wi+n1:1iNn+1},1nN.p_w(n) = |\{ w_{i}w_{i+1}\dots w_{i+n-1} : 1 \leq i \leq N-n+1 \} |, \quad 1 \leq n \leq N.0-automatic sequences pw(n)={wiwi+1wi+n1:1iNn+1},1nN.p_w(n) = |\{ w_{i}w_{i+1}\dots w_{i+n-1} : 1 \leq i \leq N-n+1 \} |, \quad 1 \leq n \leq N.1 over alphabet pw(n)={wiwi+1wi+n1:1iNn+1},1nN.p_w(n) = |\{ w_{i}w_{i+1}\dots w_{i+n-1} : 1 \leq i \leq N-n+1 \} |, \quad 1 \leq n \leq N.2, the arithmetical subword complexity pw(n)={wiwi+1wi+n1:1iNn+1},1nN.p_w(n) = |\{ w_{i}w_{i+1}\dots w_{i+n-1} : 1 \leq i \leq N-n+1 \} |, \quad 1 \leq n \leq N.3 counts length-pw(n)={wiwi+1wi+n1:1iNn+1},1nN.p_w(n) = |\{ w_{i}w_{i+1}\dots w_{i+n-1} : 1 \leq i \leq N-n+1 \} |, \quad 1 \leq n \leq N.4 words appearing along arbitrary arithmetic progressions. Maximal arithmetical subword complexity is defined by pw(n)={wiwi+1wi+n1:1iNn+1},1nN.p_w(n) = |\{ w_{i}w_{i+1}\dots w_{i+n-1} : 1 \leq i \leq N-n+1 \} |, \quad 1 \leq n \leq N.5 for all pw(n)={wiwi+1wi+n1:1iNn+1},1nN.p_w(n) = |\{ w_{i}w_{i+1}\dots w_{i+n-1} : 1 \leq i \leq N-n+1 \} |, \quad 1 \leq n \leq N.6, and is classified via the effective alphabet size pw(n)={wiwi+1wi+n1:1iNn+1},1nN.p_w(n) = |\{ w_{i}w_{i+1}\dots w_{i+n-1} : 1 \leq i \leq N-n+1 \} |, \quad 1 \leq n \leq N.7: maximal complexity occurs if and only if pw(n)={wiwi+1wi+n1:1iNn+1},1nN.p_w(n) = |\{ w_{i}w_{i+1}\dots w_{i+n-1} : 1 \leq i \leq N-n+1 \} |, \quad 1 \leq n \leq N.8 (Konieczny et al., 2023).

Classes of automatic sequences exhibit a dichotomy:

  • Block-additive (e.g., Thue–Morse, digital sum modulo pw(n)={wiwi+1wi+n1:1iNn+1},1nN.p_w(n) = |\{ w_{i}w_{i+1}\dots w_{i+n-1} : 1 \leq i \leq N-n+1 \} |, \quad 1 \leq n \leq N.9) and certain nondegenerate cases achieve maximal arithmetical complexity.
  • Periodic and forward/backward synchronizing cases have polynomial (even subexponential) complexity, with ww0.

5. Implications for Automatic and Algorithmic Complexity

Maximal subword complexity directly influences the nondeterministic automatic complexity ww1, the minimal state count in an NFA with a unique accepting path consuming ww2. Words with maximal subword complexity, such as m-sequences, force ww3. This is asymptotically close to the worst-case for arbitrary words, as ww4 for all words ww5 of length ww6 (Hyde’s result). Thus, m-sequences are nearly extremal for nondeterministic automatic complexity, and the combinatorial extremality is reflected at the automaton level (Kjos-Hanssen, 2016).

The study of maximal subword complexity connects with diverse themes:

  • Pseudorandomness and Linear Complexity: m-sequences are pseudorandom in the linear complexity sense while also being extremal for subword and automatic complexity (Kjos-Hanssen, 2016).
  • Generalizations: ww7-ary analogues (over ww8) realize the corresponding maxima, with ww9 for C(w)=max1nwpw(n).C(w) = \max_{1 \leq n \leq |w|} p_w(n).0, C(w)=max1nwpw(n).C(w) = \max_{1 \leq n \leq |w|} p_w(n).1 for C(w)=max1nwpw(n).C(w) = \max_{1 \leq n \leq |w|} p_w(n).2 in the LFSR context.
  • Open Problems: The gap in the C(w)=max1nwpw(n).C(w) = \max_{1 \leq n \leq |w|} p_w(n).3 term for C(w)=max1nwpw(n).C(w) = \max_{1 \leq n \leq |w|} p_w(n).4 for m-sequences versus the universal upper bound remains open; counting exact numbers of extremal words for arbitrary C(w)=max1nwpw(n).C(w) = \max_{1 \leq n \leq |w|} p_w(n).5 and C(w)=max1nwpw(n).C(w) = \max_{1 \leq n \leq |w|} p_w(n).6 also remains unresolved (Anisiu et al., 2010, Kjos-Hanssen, 2016).

7. Representative Examples and Realizations

Explicit constructions and computations for small alphabets and periods exemplify maximal subword complexity:

  • For C(w)=max1nwpw(n).C(w) = \max_{1 \leq n \leq |w|} p_w(n).7, C(w)=max1nwpw(n).C(w) = \max_{1 \leq n \leq |w|} p_w(n).8, an m-sequence C(w)=max1nwpw(n).C(w) = \max_{1 \leq n \leq |w|} p_w(n).9 satisfies q1q \geq 100, q1q \geq 101, q1q \geq 102, and q1q \geq 103 for q1q \geq 104, matching the combinatorial upper bounds.
  • For q1q \geq 105, q1q \geq 106, the sequence from q1q \geq 107 achieves q1q \geq 108, q1q \geq 109 for q1q \geq 110 (Kjos-Hanssen, 2016).
  • Thue–Morse and similar automatic sequences realize maximal arithmetical subword complexity via Gowers uniformity and Fourier-theoretic arguments, ensuring every pattern appears along an arithmetic progression (Konieczny et al., 2023).

These examples illustrate the structural and combinatorial richness of maximal subword complexity and its deep interconnections across automata, combinatorics, and algebraic constructions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (4)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Maximal Subword Complexity.