Factor Complexity: Invariants & Applications

Updated 22 November 2025

Factor complexity is a quantitative invariant that measures the growth rate of distinct contiguous subwords in infinite sequences or dynamical systems.
It connects with palindromic complexity and classical results like the Morse–Hedlund theorem, providing a basis for classifying periodic and aperiodic words.
The concept has broad applications, from analyzing algorithmic tractability in structured prediction to bounding computational costs in algebraic factorization and modified gravity.

Factor complexity is a central quantitative invariant in symbolic dynamics, combinatorics on words, structured prediction, and algebraic complexity theory. It typically measures the combinatorial growth rate of the number of distinct factors (contiguous subwords) of specified length in a given infinite word, language, sequence, or dynamical orbit. The concept has been rigorously developed in several mathematical frameworks, with each field emphasizing different aspects of complexity and its connections to underlying structure, ergodic properties, or algorithmic tractability.

1. Classical Factor Complexity in Combinatorics on Words

Given a finite alphabet $\mathcal{A}$ and an infinite word $w$ over $\mathcal{A}$ , the factor complexity function $C(n)$ is defined by

$C(n) = |F_n(w)| \,, \qquad F_n(w) = \{ \text{all distinct subwords of $w $of length$ n$} \}\;.$

This function counts the number of distinct length- $n$ factors of $w$ . Its growth rate classifies infinite words: ultimately periodic words have $C(n) = O(1)$ , while aperiodic words must satisfy $C(n) \geq n+1$ due to the Morse–Hedlund theorem. Special cases such as Sturmian words achieve the minimal nontrivial complexity $C(n) = n+1$ for all $n$ (0802.1332, Bell, 2022).

Extensions include concepts such as palindromic complexity $P(n)$ , which counts palindromic factors of a given length, and $N$ -factor complexity, which adapts the notion to infinite alphabets by restricting to factors using the first $N$ symbols (Li et al., 2022).

2. Factor Complexity and Structural Richness: Palindromic Connections

In words whose factor sets are closed under reversal, the interplay between factor complexity $C(n)$ and palindromic complexity $P(n)$ leads to deep combinatorial characterizations. Bucci, De Luca, Glen, and Zamboni proved the following equivalence: for any infinite word $w$ whose set of factors is closed under reversal, the conditions

(I) Every complete return to any palindromic factor is itself a palindrome,
(II) $P(n) + P(n+1) = C(n+1) - C(n) + 2$ for all $n \geq 0$ ,

are equivalent. This identity explicitly relates the increment of factor complexity to the sum of palindromic complexities and characterizes so-called "rich" words (0802.1332). It forces $C(n)$ to have at least linear growth and imposes restrictions on the periodicity structure, with Sturmian and episturmian words as canonical cases where these bounds are sharp.

3. Factor Complexity in Dynamical Systems and Symbolic Coding

In topological dynamics, factor complexity often arises as the complexity function $p_X(n)$ of a (sub)shift $(X, \sigma)$ :

$p_X(n) = \left| \{ w \in \mathcal{A}^n : [w]_+ \cap X \neq \emptyset \} \right|\,,$

where $[w]_+$ is the cylinder of elements starting with $w$ . Boshernitzan's condition links the decay rate of cylinder measures to unique ergodicity and constrains possible complexity growth: while the condition implies zero topological entropy, Cyr and Kra constructed minimal, uniquely ergodic subshifts where $p(n)$ exceeds any assigned subexponential function infinitely often, showing that unique ergodicity does not force nearly-linear complexity (Cyr et al., 2020).

Further, factor complexity is intimately related to dynamical properties such as the structure of the subshift's automorphism group, topological entropy (i.e., exponential growth of $C(n)$ ), and spectral properties of associated operators (e.g., for Schrödinger operators, gap counts in the spectrum relate to bounds on $p(n)$ ).

4. Quantitative Bounds and Examples

The precise bounds for $C(n)$ in various classes of words and dynamical systems are highly nontrivial:

Sturmian shifts: $C(n)=n+1$ , with at most linear complexity and unique ergodicity (0802.1332, Bell, 2022).
S-adic words from Arnoux-Rauzy-Poincaré substitutions: Berthe and Labbé showed $2n+1 \leq p(n) \leq (5/2)n+1$ , where $p(n+1)-p(n) \in \{2,3\}$ ; their explicit combinatorial analysis via bispecial words demonstrates why the growth remains linear despite a rich substitution structure (Berthé et al., 2014).
For automatic words or interval exchange sequences, $C(n) = O(n)$ , and in general, linear upper bounds on $C(n)$ imply strong finiteness results for possible orbit-closure topologies (Bell, 2022).
For infinite-alphabet or digital sequences, the $N$ -factor complexity typically grows as $(n-1)N^2/2$ for fixed $n$ and $N \rightarrow \infty$ outside of degenerate counting cases (Li et al., 2022).

5. Factor Complexity in Algebraic Complexity Theory

The notion of factor complexity also plays a pivotal role in the algebraic complexity of polynomial factorization. In this context, it quantitatively bounds the algebraic or approximative straight-line complexity $L(g)$ of a factor $g$ of a polynomial $f$ in terms of $L(f)$ and the degree $d$ of $g$ . For polynomials of bounded degree over fields of characteristic zero, Bürgisser established

$L(g) \leq O\left( M(d)M(d^4)L(f) + d^{2\gamma} M(d)^2 \right)\,,$

where $M(d)$ is the cost of univariate multiplication and $\gamma$ is the matrix multiplication exponent (Bürgisser, 2018). This result directly extends and improves upon Kaltofen's earlier bounds, eliminating explicit dependence on the multiplicity exponent $e$ via perturbation arguments.

In this setting, factor complexity quantifies computational resources for implicit data representations (such as the graph of a one-way function) and underpins the security assumptions in cryptographic protocols.

6. Factor-Graph Complexity in Structured Prediction

In statistical learning, particularly structured prediction (e.g., sequence labeling, graphical models), "factor graph complexity" is a data-dependent analog of factor complexity. It quantifies the Rademacher complexity of hypothesis classes that decompose over graph factors, controlling generalization bounds for models such as conditional random fields (CRFs). The factor graph complexity $\Re^G_m(\mathcal{H})$ enters directly in tightest known margin-based risk bounds, and empirical studies indicate that controlling this complexity improves generalization—especially for high-order or highly connected factor graphs (Cortes et al., 2016).

The formalism:

$\widehat{\Re}^G_S(\mathcal{H}) = \frac{1}{m} \mathbb{E}_{\sigma}\left[ \sup_{h \in \mathcal{H}} \sum_{i=1}^m \sum_{f \in F_i} \sum_{y \in \mathcal{Y}_f} \sqrt{|F_i|}\, \sigma_{i,f,y} h_f(x_i, y) \right]\,,$

provides estimable and upper-bounded measures connecting factor structure, feature sparsity, and algorithmic learning rates.

7. Factor Complexity in Modified Gravity and Self-Gravitating Systems

The "complexity factor" concept, originating in general relativity, quantifies the structural complexity of self-gravitating fluid spheres through an invariant scalar constructed from the orthogonal splitting of the Riemann tensor. In both classical GR and large families of modified gravity theories ( $f(R)$ , Palatini $f(R)$ , $f(R,T,R_{\mu\nu}T^{\mu\nu})$ , Rastall–Rainbow, and others), the complexity factor $Y_{TF}$ encodes the deviations from minimal (homogeneous, isotropic) structure due to density gradients, pressure anisotropy, charge, and modification-induced terms (Abbas et al., 2018, Yousaf, 2020, Sharif et al., 2018, Yousaf et al., 2020, Ye et al., 2024, Sharif et al., 2023, Heras et al., 2022, Andrade et al., 2021, Bhattacharya et al., 2023).

A generic formulation is:

$Y_{TF}(r) = \kappa(p_r - p_t) - \frac{4\pi}{r^3} \int_0^r \tilde r^3 \rho'(\tilde r)\,d\tilde r + (\text{correction terms depending on theory})$

The vanishing of $Y_{TF}$ characterizes the least complex equilibrium: exactly homogeneous, isotropic (or with tuned anisotropy cancelling inhomogeneity and theory corrections). In gravitational decoupling, $Y_{TF}$ is used as a supplementary closure, generating new families of solutions parametrized by "complexity profiles."

References

Factor complexity, palindromic complexity, and Rauzy graphs: (0802.1332)
Topological invariants, Rec(w), and linear upper bounds: (Bell, 2022)
Linear bounds for S-adic systems and ARP algorithms: (Berthé et al., 2014)
Existence of zero entropy, high-complexity, uniquely ergodic subshifts: (Cyr et al., 2020)
$N$ -factor complexity for sequences on infinite alphabets: (Li et al., 2022)
Algebraic complexity of polynomial factors: (Bürgisser, 2018)
Factor-graph Rademacher complexity: (Cortes et al., 2016)
Complexity factor in self-gravitating spheres, modified gravity: (Abbas et al., 2018, Sharif et al., 2023, Ye et al., 2024, Yousaf, 2020, Yousaf et al., 2020, Heras et al., 2022, Andrade et al., 2021, Sharif et al., 2018, Bhattacharya et al., 2023)

Summary Table: Main Notions of Factor Complexity

Context	Formal Definition/Key Formula	Role/Consequences
Symbolic Dynamics	$C(n) = \|\{\text{length-}n\ \text{factors of }w\}\|$	Invariant for shift orbits, entropy
Palindromic Complexity	$P(n)$ = number of palindromic factors of length $n$	Characterizes "rich" words
$N$ -Factor Complexity	$P_a(n,N) = \|F_a(n) \cap E_N^n\|$	Handles infinite alphabets
Algebraic Polynomials	$L(g) \leq \mathrm{poly}(L(f),d)$ for $f$ with factor $g$	Upper bounds for factorization cost
Structured Prediction	$\widehat{\Re}^G_S(\mathcal{H})$ (as above)	Data-dependent learning rates
Relativistic Fluids	$Y_{TF}= \kappa(p_r-p_t) - \frac{4\pi}{r^3}\int_0^r\tilde r^3\rho' d\tilde r$	Measures deviation from minimal structure

This array of precise, theory-driven complexity measures underpins fine classification of combinatorial, dynamical, computational, and physical systems, revealing structural and algorithmic constraints and enabling a principled analysis of complexity in diverse mathematical domains.