Maximal Subword Complexity
- Maximal subword complexity is a measure that quantifies the maximal count of distinct contiguous subwords (factors) in a finite or infinite sequence over a finite alphabet.
- It is computed using the subword complexity function p(n), providing insights into the combinatorial extremality of sequences, exemplified by de Bruijn words and m-sequences.
- This concept influences various fields such as automata theory, cryptography, and sequence design by revealing links to extremal properties and nondeterministic automatic complexity.
Maximal subword complexity quantifies, for a finite or infinite word over a finite alphabet, the maximal possible number of distinct contiguous subwords (also called factors or blocks) of given length. This notion captures the combinatorial extremality of certain sequences, providing a central tool in the analysis of sequences with high combinatorial, algorithmic, or pseudorandom structure. Maximal subword complexity is critical in fields such as combinatorics on words, finite automata theory, and sequence design for communication and cryptography.
1. Core Definitions and General Properties
Let be a finite alphabet of cardinality %%%%1%%%%, and let be a finite word of length . The subword complexity function counts the number of distinct contiguous factors of of length : The maximal complexity of a word is defined as
The global maximal complexity for fixed is
and the set of block lengths at which this maximum is attained is denoted by
denotes the number of words of length attaining . For infinite words , the subword complexity at is .
Maximal subword complexity is governed by the trivial combinatorial upper bound: for words of length , since at most words of length exist, and at most distinct position shifts fit in a word of length (Anisiu et al., 2010).
2. Extremal Sequences and Characterizing Maximal Complexity
De Bruijn–Martin Words: Extremal words achieving are precisely prefixes of de Bruijn words (de Bruijn–Martin words). A de Bruijn word of order is a shortest word containing all length- factors exactly once. The extremal word construction proceeds as follows:
- For , is achieved at .
- Such is a prefix of a de Bruijn word of order .
m-Sequences (Maximal-Length LFSR Sequences): For binary words produced by a -stage linear feedback shift register (LFSR) with primitive characteristic polynomial (producing so-called m-sequences), maximal cyclic subword complexity is achieved: for period , for , and for (Kjos-Hanssen, 2016). This reflects the property that the LFSR, over one period, generates all nonzero -tuples precisely once (Golomb’s sliding-window property).
Classes: De Bruijn words and (by extension) m-sequences represent sequences achieving the combinatorial upper bound for subword complexity; they play a foundational role in sequence design and extremal combinatorics (Anisiu et al., 2010, Kjos-Hanssen, 2016).
3. Exact Formulae, Enumeration, and Graph Structure
The enumeration and structural theory of maximal complexity are tightly connected to directed de Bruijn graphs. Vertices are possible length- words; edges correspond to possible extensions. An extremal word corresponds to a path (or Hamiltonian cycle for ) traversing each edge/vertex.
Key Results (Anisiu et al., 2010):
- The maximal complexity for is , attained for .
- For such :
- = number of directed paths of length in .
- For , counts Hamiltonian cycles: for , .
This correspondence yields explicit counts for small , as shown in the table below for binary words with :
For general and , the enumeration of extremal words reduces to counting paths in the appropriate de Bruijn graph—a computation efficiently handled via adjacency matrix powers, but lacking a closed form in in the general case.
4. Maximal Complexity for Infinite and Structured Words
For infinite sequences, the subword complexity is of primary interest.
- Quasiperiodic Words: Given a fixed quasiperiod , the set of infinite quasiperiodic words has subword complexity characterized asymptotically as , where is the unique largest positive root of a polynomial determined by the code structure derived from (Polley et al., 2010). The universal upper bound within this class is , , with solving .
- Automatic Sequences and Arithmetical Subword Complexity: For -automatic sequences over alphabet , the arithmetical subword complexity counts length- words appearing along arbitrary arithmetic progressions. Maximal arithmetical subword complexity is defined by for all , and is classified via the effective alphabet size : maximal complexity occurs if and only if (Konieczny et al., 2023).
Classes of automatic sequences exhibit a dichotomy:
- Block-additive (e.g., Thue–Morse, digital sum modulo ) and certain nondegenerate cases achieve maximal arithmetical complexity.
- Periodic and forward/backward synchronizing cases have polynomial (even subexponential) complexity, with .
5. Implications for Automatic and Algorithmic Complexity
Maximal subword complexity directly influences the nondeterministic automatic complexity , the minimal state count in an NFA with a unique accepting path consuming . Words with maximal subword complexity, such as m-sequences, force . This is asymptotically close to the worst-case for arbitrary words, as for all words of length (Hyde’s result). Thus, m-sequences are nearly extremal for nondeterministic automatic complexity, and the combinatorial extremality is reflected at the automaton level (Kjos-Hanssen, 2016).
6. Broader Context and Related Structures
The study of maximal subword complexity connects with diverse themes:
- Pseudorandomness and Linear Complexity: m-sequences are pseudorandom in the linear complexity sense while also being extremal for subword and automatic complexity (Kjos-Hanssen, 2016).
- Generalizations: -ary analogues (over ) realize the corresponding maxima, with for , for in the LFSR context.
- Open Problems: The gap in the term for for m-sequences versus the universal upper bound remains open; counting exact numbers of extremal words for arbitrary and also remains unresolved (Anisiu et al., 2010, Kjos-Hanssen, 2016).
7. Representative Examples and Realizations
Explicit constructions and computations for small alphabets and periods exemplify maximal subword complexity:
- For , , an m-sequence satisfies , , , and for , matching the combinatorial upper bounds.
- For , , the sequence from achieves , for (Kjos-Hanssen, 2016).
- Thue–Morse and similar automatic sequences realize maximal arithmetical subword complexity via Gowers uniformity and Fourier-theoretic arguments, ensuring every pattern appears along an arithmetic progression (Konieczny et al., 2023).
These examples illustrate the structural and combinatorial richness of maximal subword complexity and its deep interconnections across automata, combinatorics, and algebraic constructions.