Papers
Topics
Authors
Recent
2000 character limit reached

Factor Complexity: Invariants & Applications

Updated 22 November 2025
  • Factor complexity is a quantitative invariant that measures the growth rate of distinct contiguous subwords in infinite sequences or dynamical systems.
  • It connects with palindromic complexity and classical results like the Morse–Hedlund theorem, providing a basis for classifying periodic and aperiodic words.
  • The concept has broad applications, from analyzing algorithmic tractability in structured prediction to bounding computational costs in algebraic factorization and modified gravity.

Factor complexity is a central quantitative invariant in symbolic dynamics, combinatorics on words, structured prediction, and algebraic complexity theory. It typically measures the combinatorial growth rate of the number of distinct factors (contiguous subwords) of specified length in a given infinite word, language, sequence, or dynamical orbit. The concept has been rigorously developed in several mathematical frameworks, with each field emphasizing different aspects of complexity and its connections to underlying structure, ergodic properties, or algorithmic tractability.

1. Classical Factor Complexity in Combinatorics on Words

Given a finite alphabet A\mathcal{A} and an infinite word ww over A\mathcal{A}, the factor complexity function C(n)C(n) is defined by

$C(n) = |F_n(w)| \,, \qquad F_n(w) = \{ \text{all distinct subwords of $woflength of length n$} \}\;.$

This function counts the number of distinct length-nn factors of ww. Its growth rate classifies infinite words: ultimately periodic words have C(n)=O(1)C(n) = O(1), while aperiodic words must satisfy C(n)n+1C(n) \geq n+1 due to the Morse–Hedlund theorem. Special cases such as Sturmian words achieve the minimal nontrivial complexity C(n)=n+1C(n) = n+1 for all nn (0802.1332, Bell, 2022).

Extensions include concepts such as palindromic complexity P(n)P(n), which counts palindromic factors of a given length, and NN-factor complexity, which adapts the notion to infinite alphabets by restricting to factors using the first NN symbols (Li et al., 2022).

2. Factor Complexity and Structural Richness: Palindromic Connections

In words whose factor sets are closed under reversal, the interplay between factor complexity C(n)C(n) and palindromic complexity P(n)P(n) leads to deep combinatorial characterizations. Bucci, De Luca, Glen, and Zamboni proved the following equivalence: for any infinite word ww whose set of factors is closed under reversal, the conditions

  • (I) Every complete return to any palindromic factor is itself a palindrome,
  • (II) P(n)+P(n+1)=C(n+1)C(n)+2P(n) + P(n+1) = C(n+1) - C(n) + 2 for all n0n \geq 0,

are equivalent. This identity explicitly relates the increment of factor complexity to the sum of palindromic complexities and characterizes so-called "rich" words (0802.1332). It forces C(n)C(n) to have at least linear growth and imposes restrictions on the periodicity structure, with Sturmian and episturmian words as canonical cases where these bounds are sharp.

3. Factor Complexity in Dynamical Systems and Symbolic Coding

In topological dynamics, factor complexity often arises as the complexity function pX(n)p_X(n) of a (sub)shift (X,σ)(X, \sigma):

pX(n)={wAn:[w]+X},p_X(n) = \left| \{ w \in \mathcal{A}^n : [w]_+ \cap X \neq \emptyset \} \right|\,,

where [w]+[w]_+ is the cylinder of elements starting with ww. Boshernitzan's condition links the decay rate of cylinder measures to unique ergodicity and constrains possible complexity growth: while the condition implies zero topological entropy, Cyr and Kra constructed minimal, uniquely ergodic subshifts where p(n)p(n) exceeds any assigned subexponential function infinitely often, showing that unique ergodicity does not force nearly-linear complexity (Cyr et al., 2020).

Further, factor complexity is intimately related to dynamical properties such as the structure of the subshift's automorphism group, topological entropy (i.e., exponential growth of C(n)C(n)), and spectral properties of associated operators (e.g., for Schrödinger operators, gap counts in the spectrum relate to bounds on p(n)p(n)).

4. Quantitative Bounds and Examples

The precise bounds for C(n)C(n) in various classes of words and dynamical systems are highly nontrivial:

  • Sturmian shifts: C(n)=n+1C(n)=n+1, with at most linear complexity and unique ergodicity (0802.1332, Bell, 2022).
  • S-adic words from Arnoux-Rauzy-Poincaré substitutions: Berthe and Labbé showed 2n+1p(n)(5/2)n+12n+1 \leq p(n) \leq (5/2)n+1, where p(n+1)p(n){2,3}p(n+1)-p(n) \in \{2,3\}; their explicit combinatorial analysis via bispecial words demonstrates why the growth remains linear despite a rich substitution structure (Berthé et al., 2014).
  • For automatic words or interval exchange sequences, C(n)=O(n)C(n) = O(n), and in general, linear upper bounds on C(n)C(n) imply strong finiteness results for possible orbit-closure topologies (Bell, 2022).
  • For infinite-alphabet or digital sequences, the NN-factor complexity typically grows as (n1)N2/2(n-1)N^2/2 for fixed nn and NN \rightarrow \infty outside of degenerate counting cases (Li et al., 2022).

5. Factor Complexity in Algebraic Complexity Theory

The notion of factor complexity also plays a pivotal role in the algebraic complexity of polynomial factorization. In this context, it quantitatively bounds the algebraic or approximative straight-line complexity L(g)L(g) of a factor gg of a polynomial ff in terms of L(f)L(f) and the degree dd of gg. For polynomials of bounded degree over fields of characteristic zero, Bürgisser established

L(g)O(M(d)M(d4)L(f)+d2γM(d)2),L(g) \leq O\left( M(d)M(d^4)L(f) + d^{2\gamma} M(d)^2 \right)\,,

where M(d)M(d) is the cost of univariate multiplication and γ\gamma is the matrix multiplication exponent (Bürgisser, 2018). This result directly extends and improves upon Kaltofen's earlier bounds, eliminating explicit dependence on the multiplicity exponent ee via perturbation arguments.

In this setting, factor complexity quantifies computational resources for implicit data representations (such as the graph of a one-way function) and underpins the security assumptions in cryptographic protocols.

6. Factor-Graph Complexity in Structured Prediction

In statistical learning, particularly structured prediction (e.g., sequence labeling, graphical models), "factor graph complexity" is a data-dependent analog of factor complexity. It quantifies the Rademacher complexity of hypothesis classes that decompose over graph factors, controlling generalization bounds for models such as conditional random fields (CRFs). The factor graph complexity mG(H)\Re^G_m(\mathcal{H}) enters directly in tightest known margin-based risk bounds, and empirical studies indicate that controlling this complexity improves generalization—especially for high-order or highly connected factor graphs (Cortes et al., 2016).

The formalism:

^SG(H)=1mEσ[suphHi=1mfFiyYfFiσi,f,yhf(xi,y)],\widehat{\Re}^G_S(\mathcal{H}) = \frac{1}{m} \mathbb{E}_{\sigma}\left[ \sup_{h \in \mathcal{H}} \sum_{i=1}^m \sum_{f \in F_i} \sum_{y \in \mathcal{Y}_f} \sqrt{|F_i|}\, \sigma_{i,f,y} h_f(x_i, y) \right]\,,

provides estimable and upper-bounded measures connecting factor structure, feature sparsity, and algorithmic learning rates.

7. Factor Complexity in Modified Gravity and Self-Gravitating Systems

The "complexity factor" concept, originating in general relativity, quantifies the structural complexity of self-gravitating fluid spheres through an invariant scalar constructed from the orthogonal splitting of the Riemann tensor. In both classical GR and large families of modified gravity theories (f(R)f(R), Palatini f(R)f(R), f(R,T,RμνTμν)f(R,T,R_{\mu\nu}T^{\mu\nu}), Rastall–Rainbow, and others), the complexity factor YTFY_{TF} encodes the deviations from minimal (homogeneous, isotropic) structure due to density gradients, pressure anisotropy, charge, and modification-induced terms (Abbas et al., 2018, Yousaf, 2020, Sharif et al., 2018, Yousaf et al., 2020, Ye et al., 4 Oct 2024, Sharif et al., 2023, Heras et al., 2022, Andrade et al., 2021, Bhattacharya et al., 2023).

A generic formulation is:

YTF(r)=κ(prpt)4πr30rr~3ρ(r~)dr~+(correction terms depending on theory)Y_{TF}(r) = \kappa(p_r - p_t) - \frac{4\pi}{r^3} \int_0^r \tilde r^3 \rho'(\tilde r)\,d\tilde r + (\text{correction terms depending on theory})

The vanishing of YTFY_{TF} characterizes the least complex equilibrium: exactly homogeneous, isotropic (or with tuned anisotropy cancelling inhomogeneity and theory corrections). In gravitational decoupling, YTFY_{TF} is used as a supplementary closure, generating new families of solutions parametrized by "complexity profiles."

References

Summary Table: Main Notions of Factor Complexity

Context Formal Definition/Key Formula Role/Consequences
Symbolic Dynamics C(n)={length-n factors of w}C(n) = |\{\text{length-}n\ \text{factors of }w\}| Invariant for shift orbits, entropy
Palindromic Complexity P(n)P(n) = number of palindromic factors of length nn Characterizes "rich" words
NN-Factor Complexity Pa(n,N)=Fa(n)ENnP_a(n,N) = |F_a(n) \cap E_N^n| Handles infinite alphabets
Algebraic Polynomials L(g)poly(L(f),d)L(g) \leq \mathrm{poly}(L(f),d) for ff with factor gg Upper bounds for factorization cost
Structured Prediction ^SG(H)\widehat{\Re}^G_S(\mathcal{H}) (as above) Data-dependent learning rates
Relativistic Fluids YTF=κ(prpt)4πr30rr~3ρdr~Y_{TF}= \kappa(p_r-p_t) - \frac{4\pi}{r^3}\int_0^r\tilde r^3\rho' d\tilde r Measures deviation from minimal structure

This array of precise, theory-driven complexity measures underpins fine classification of combinatorial, dynamical, computational, and physical systems, revealing structural and algorithmic constraints and enabling a principled analysis of complexity in diverse mathematical domains.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Factor Complexity.