Papers
Topics
Authors
Recent
Search
2000 character limit reached

Maximal Subword Complexity

Updated 6 February 2026
  • Maximal subword complexity is a measure that quantifies the maximal count of distinct contiguous subwords (factors) in a finite or infinite sequence over a finite alphabet.
  • It is computed using the subword complexity function p(n), providing insights into the combinatorial extremality of sequences, exemplified by de Bruijn words and m-sequences.
  • This concept influences various fields such as automata theory, cryptography, and sequence design by revealing links to extremal properties and nondeterministic automatic complexity.

Maximal subword complexity quantifies, for a finite or infinite word over a finite alphabet, the maximal possible number of distinct contiguous subwords (also called factors or blocks) of given length. This notion captures the combinatorial extremality of certain sequences, providing a central tool in the analysis of sequences with high combinatorial, algorithmic, or pseudorandom structure. Maximal subword complexity is critical in fields such as combinatorics on words, finite automata theory, and sequence design for communication and cryptography.

1. Core Definitions and General Properties

Let A\mathcal{A} be a finite alphabet of cardinality %%%%1%%%%, and let wANw \in \mathcal{A}^N be a finite word of length NN. The subword complexity function pw(n)p_w(n) counts the number of distinct contiguous factors of ww of length nn: pw(n)={wiwi+1wi+n1:1iNn+1},1nN.p_w(n) = |\{ w_{i}w_{i+1}\dots w_{i+n-1} : 1 \leq i \leq N-n+1 \} |, \quad 1 \leq n \leq N. The maximal complexity of a word ww is defined as

C(w)=max1nwpw(n).C(w) = \max_{1 \leq n \leq |w|} p_w(n).

The global maximal complexity for fixed NN is

K(N)=maxwANC(w),K(N) = \max_{w \in \mathcal{A}^N} C(w),

and the set of block lengths at which this maximum is attained is denoted by

R(N)={n:1nN, wAN, pw(n)=K(N)}.R(N) = \left\{ n : 1 \leq n \leq N,\ \exists w \in \mathcal{A}^N,\ p_w(n)=K(N) \right\}.

M(N)M(N) denotes the number of words of length NN attaining K(N)K(N). For infinite words ξAω\xi \in \mathcal{A}^\omega, the subword complexity at nn is Cξ(n)={uAn:u appears as a factor in ξ}C_\xi(n) = |\{ u \in \mathcal{A}^n : u \text{ appears as a factor in } \xi \}|.

Maximal subword complexity is governed by the trivial combinatorial upper bound: min{qn,N}\min\{q^n,\,N\} for words of length NN, since at most qnq^n words of length nn exist, and at most NN distinct position shifts fit in a word of length NN (Anisiu et al., 2010).

2. Extremal Sequences and Characterizing Maximal Complexity

De Bruijn–Martin Words: Extremal words achieving K(N)K(N) are precisely prefixes of de Bruijn words (de Bruijn–Martin words). A de Bruijn word of order mm is a shortest word containing all qmq^m length-mm factors exactly once. The extremal word construction proceeds as follows:

  • For qk+kNqk+1+kq^k + k \leq N \leq q^{k+1} + k, K(N)=NkK(N) = N - k is achieved at n=k+1n = k + 1.
  • Such ww is a prefix of a de Bruijn word of order k+1k+1.

m-Sequences (Maximal-Length LFSR Sequences): For binary words produced by a kk-stage linear feedback shift register (LFSR) with primitive characteristic polynomial (producing so-called m-sequences), maximal cyclic subword complexity is achieved: for period n=2k1n=2^k-1, px()=2p_{x^\infty}(\ell)=2^\ell for 1k11 \leq \ell \leq k-1, and px()=np_{x^\infty}(\ell)=n for knk \leq \ell \leq n (Kjos-Hanssen, 2016). This reflects the property that the LFSR, over one period, generates all nonzero kk-tuples precisely once (Golomb’s sliding-window property).

Classes: De Bruijn words and (by extension) m-sequences represent sequences achieving the combinatorial upper bound for subword complexity; they play a foundational role in sequence design and extremal combinatorics (Anisiu et al., 2010, Kjos-Hanssen, 2016).

3. Exact Formulae, Enumeration, and Graph Structure

The enumeration and structural theory of maximal complexity are tightly connected to directed de Bruijn graphs. Vertices are qkq^k possible length-kk words; edges correspond to possible extensions. An extremal word corresponds to a path (or Hamiltonian cycle for N=qk+k1N=q^k+k-1) traversing each edge/vertex.

Key Results (Anisiu et al., 2010):

  • The maximal complexity K(N)K(N) for qk+kNqk+1+kq^k + k \leq N \leq q^{k+1} + k is K(N)=NkK(N)=N-k, attained for n=k+1n = k+1.
  • For such NN:
    • M(N)M(N) = number of directed paths of length Nk1N-k-1 in B(q,k+1)B(q,k+1).
    • For N=qk+k1N=q^k+k-1, M(N)M(N) counts Hamiltonian cycles: for q=2q=2, M(2k+k1)=22k1M(2^k+k-1) = 2^{2^{k-1}}.

This correspondence yields explicit counts for small NN, as shown in the table below for binary words with N10N \leq 10:

N K(N) R(N) M(N)
1 1 {1} 2
4 3 {2} 8
7 5 {3} 42
10 8 {3} 16

For general qq and NN, the enumeration of extremal words reduces to counting paths in the appropriate de Bruijn graph—a computation efficiently handled via adjacency matrix powers, but lacking a closed form in NN in the general case.

4. Maximal Complexity for Infinite and Structured Words

For infinite sequences, the subword complexity Cξ(n)C_\xi(n) is of primary interest.

  • Quasiperiodic Words: Given a fixed quasiperiod qq, the set of infinite quasiperiodic words has subword complexity characterized asymptotically as Cξ(n)=Θ(λqn)C_\xi(n) = \Theta(\lambda_q^n), where λq\lambda_q is the unique largest positive root of a polynomial determined by the code structure derived from qq (Polley et al., 2010). The universal upper bound within this class is tPnt_P^n, tP1.324718t_P \approx 1.324718, with tPt_P solving t3t1=0t^3 - t - 1 = 0.
  • Automatic Sequences and Arithmetical Subword Complexity: For kk-automatic sequences aa over alphabet Ω\Omega, the arithmetical subword complexity paarith()p_a^{\mathrm{arith}}(\ell) counts length-\ell words appearing along arbitrary arithmetic progressions. Maximal arithmetical subword complexity is defined by paarith()=Ωp_a^{\mathrm{arith}}(\ell) = |\Omega|^\ell for all \ell, and is classified via the effective alphabet size r(a)r(a): maximal complexity occurs if and only if r(a)=Ωr(a) = |\Omega| (Konieczny et al., 2023).

Classes of automatic sequences exhibit a dichotomy:

  • Block-additive (e.g., Thue–Morse, digital sum modulo mm) and certain nondegenerate cases achieve maximal arithmetical complexity.
  • Periodic and forward/backward synchronizing cases have polynomial (even subexponential) complexity, with r(a)=1r(a)=1.

5. Implications for Automatic and Algorithmic Complexity

Maximal subword complexity directly influences the nondeterministic automatic complexity AN(x)A_N(x), the minimal state count in an NFA with a unique accepting path consuming xx. Words with maximal subword complexity, such as m-sequences, force AN(x)x/2O(log2x)A_N(x)\approx |x|/2 - O(\log^2 |x|). This is asymptotically close to the worst-case for arbitrary words, as AN(y)n/2+1A_N(y)\leq \lfloor n/2 \rfloor + 1 for all words yy of length nn (Hyde’s result). Thus, m-sequences are nearly extremal for nondeterministic automatic complexity, and the combinatorial extremality is reflected at the automaton level (Kjos-Hanssen, 2016).

The study of maximal subword complexity connects with diverse themes:

  • Pseudorandomness and Linear Complexity: m-sequences are pseudorandom in the linear complexity sense while also being extremal for subword and automatic complexity (Kjos-Hanssen, 2016).
  • Generalizations: qq-ary analogues (over Fq\mathbb{F}_q) realize the corresponding maxima, with p()=qp(\ell)=q^\ell for <k\ell < k, p()=qk1p(\ell)=q^k-1 for k\ell \geq k in the LFSR context.
  • Open Problems: The gap in the O(log2n)O(\log^2 n) term for ANA_N for m-sequences versus the universal upper bound remains open; counting exact numbers of extremal words for arbitrary NN and qq also remains unresolved (Anisiu et al., 2010, Kjos-Hanssen, 2016).

7. Representative Examples and Realizations

Explicit constructions and computations for small alphabets and periods exemplify maximal subword complexity:

  • For k=3k=3, n=7n=7, an m-sequence x=0001011x=0001011 satisfies px(1)=2p_{x^\infty}(1)=2, px(2)=4p_{x^\infty}(2)=4, px(3)=7p_{x^\infty}(3)=7, and px()=7p_{x^\infty}(\ell)=7 for 474 \leq \ell \leq 7, matching the combinatorial upper bounds.
  • For k=4k=4, n=15n=15, the sequence from x4+x+1x^4 + x + 1 achieves p(1)=2,p(2)=4,p(3)=8,p(4)=15p(1)=2, p(2)=4, p(3)=8, p(4)=15, p()=15p(\ell)=15 for 4\ell \geq 4 (Kjos-Hanssen, 2016).
  • Thue–Morse and similar automatic sequences realize maximal arithmetical subword complexity via Gowers uniformity and Fourier-theoretic arguments, ensuring every pattern appears along an arithmetic progression (Konieczny et al., 2023).

These examples illustrate the structural and combinatorial richness of maximal subword complexity and its deep interconnections across automata, combinatorics, and algebraic constructions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (4)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Maximal Subword Complexity.