MSO k-ary Queries & Extensions

Updated 13 December 2025

MSO k-ary queries are defined using monadic second-order logic with k free variables to express relations over words, trees, and other structures.
The analysis reveals that plain MSO cannot define k-hashing properties, necessitating minimal extensions like equi-cardinality predicates for exact counting.
Parameterization results and automata techniques enable encoding query outputs efficiently, impacting coding theory, transductions, and logical query optimization.

Monadic second-order (MSO) logic provides a canonical formalism for expressing queries and properties over words, trees, and other relational structures. MSO k-ary queries refer to predicates or relations expressible in MSO with $k$ free variables, mapping structures to sets of k-tuples satisfying a property defined by a formula. These form the basis of a wide range of definability, expressiveness, and model-theoretic phenomena at the interface of logic, automata, and combinatorics, with applications in coding theory, string transductions, and expressive power characterizations. Recent advances delineate the boundaries of such definability, with profound implications both for structural complexity and for the practical design of logical formalisms.

1. Formal Definitions: MSO k-ary Queries and Key Examples

Let $\Sigma$ be a finite alphabet and $w \in \Sigma^*$ a finite word. The MSO structure on $w$ is defined with domain $\{1, \ldots, |w|\}$ , the natural order $<$ , and unary predicates $P_a(x)$ indicating letter positions. Monadic second-order logic allows quantification over both individual positions and sets thereof. A $k$ -ary MSO query is any formula $\varphi(x_1,\ldots,x_k)$ , where $x_1,\ldots,x_k$ are first-order variables interpreted as word positions.

For each $w$ , the result set is

$Q_\varphi(w) = \{ (i_1, ..., i_k) \;|\; w \models \varphi(i_1,\ldots,i_k) \}.$

Such definable relations encompass a wide range of combinatorial properties. Example: for $\Sigma = \{0,1\}$ , let $\eta_0(x) := P_0(x)$ , $\eta_1(x) := P_1(x)$ . A ternary query $\varphi(i,j,d) := P_0(i) \wedge P_1(j) \wedge \text{Even}_0^d(i+1,j-1)$ (where $\text{Even}_0^d(i+1,j-1)$ expresses that the number of 0’s in $(i,j)$ is congruent to $d$ mod $2$) is MSO-definable; its result set size is $O(\#0(w)\cdot\#1(w))$ (Nguyên et al., 6 Dec 2025).

2. Definability and Limitations: The k-Hashing Problem

Certain $k$ -ary relations of interest in computer science are not MSO-definable. For integers $n \geq 1$ and $1 \leq k \leq b$ , let $(w_0,\ldots,w_{k-1}) \in (\{0,1,\ldots,b-1\}^n)^k$ be a $k$ -tuple of words. The tuple is $(b,k)$ -hashed if there exists a coordinate $1 \leq \ell \leq n$ such that all $k$ words have pairwise distinct symbols at position $\ell$ .

Testing whether a code is $k$ -hashing, and computing the maximal size of $k$ -hash codes, arises in combinatorics and information theory. In the MSO framework, words are modeled as paths in the infinite $b$ -ary tree, with sets $X_0,\ldots,X_{k-1}$ encoding the $k$ words as sets of tree nodes. An "obvious" MSO formula attempts to witness the $k$ -hash property by existentially guessing a level $\ell$ with $k$ distinct next-edge moves—corresponding to the $k$ symbols—but this approach ultimately fails in full generality.

A central theorem establishes that for all $n$ (finite or infinite words), no MSO formula defines the $k$ -hashing relation (Costa et al., 16 Sep 2025). The proof utilizes Ehrenfeucht–Fraïssé games for MSO: even over paths differing at a unique coordinate, duplicator strategies exist so that no MSO formula of bounded rank can distinguish tuples $(X_0,X_1,...,X_{k-1})$ and $(Y_0,X_1,...,X_{k-1})$ , with $X_0 \neq Y_0$ , thus violating definability by any candidate formula.

3. Overcoming Limitations via Counting Extensions

The inexpressibility of $k$ -hashing in plain MSO is traced to the inability of MSO to express exact cardinalities of sets in the absence of counting quantifiers. While MSO can existentially assert the presence of paths through a given level, it cannot constrain the number of such intersections to be exactly one per set, nor can it enforce that the selected nodes are all distinct.

Adding an equi-cardinality predicate $\mathrm{eqcard}(U, V)$ —true iff $|U| = |V| < \infty$ —yields an extension, denoted MSO+ $\mathrm{eqcard}$ . In MSO+ $\mathrm{eqcard}$ , one can define:

Singletons: $U$ is a singleton iff $\exists x (x \in U) \wedge \mathrm{eqcard}(U, \{x\})$ .
$k$ -distinction: $\wedge_{i<j} \neg\,\mathrm{eqcard}(U_i, U_j)$ .

Thus, the $(b,k)$ -hashing property becomes MSO+ $\mathrm{eqcard}$ -definable by existentially guessing a level $D$ , picking singleton intersections $U_i$ for each path, and requiring all $U_i$ to be mutually distinct nodes. This extension suffices, and in fact, exact cardinality, or more generally, the ability to express $|U| = |V| + c$ for small $c$ , is the minimal necessary augmentation (Costa et al., 16 Sep 2025). Counting modulo quantifiers (CMSO), or full Presburger arithmetic, also suffice.

Logic	Can Define $k$ -hashing?	Minimal Counting Feature
MSO	No	None
MSO+ $\mathrm{eqcard}$	Yes	Equi-cardinality (exact counting)
CMSO	Yes	Modulo counting

4. Parameterization and MSO k-ary Query Structure

A complementary direction is provided by the finer reparameterisation theorem for MSO/FO queries on strings (Nguyên et al., 6 Dec 2025). Suppose $\varphi(\vec{x})$ is a $k$ -ary MSO query over words, and $\eta_1(x),...,\eta_\ell(x)$ are unary MSO formulas. If $|\varphi(w)| = O(|\eta_1(w)| \cdots |\eta_\ell(w)|)$ for all $w$ , then each solution $\vec{i}$ to $\varphi$ can be MSO-definably encoded via an $\ell$ -tuple $(j_1,...,j_\ell)$ with $w \models \eta_m(j_m)$ , up to $O(1)$ ambiguity. Formally, an MSO formula $\psi(\vec{x}; \vec{y})$ establishes a total function from $Q_\varphi(w)$ to tuples of positions witnessing the $\eta$ ’s.

This result leverages automata-theoretic machinery: recognition by a monoid, construction of factorization forests (Simon's theorem), and a "points-to" graph encoding dependencies among tuple components. Pumping arguments and Hall’s theorem underlie the boundedness and surjectivity conditions. A key inferential implication is that "dimension minimization" follows: if an FO string-to-string interpretation of dimension $d$ yields size $O(|w|^\ell)$ , then one can find a dimension- $\ell$ FO interpretation with the same behavior (Nguyên et al., 6 Dec 2025).

5. Practical and Theoretical Implications

Expressiveness boundaries for MSO k-ary queries have deep repercussions:

Coding theory: The inexpressibility of $k$ -hashing in MSO pinpoints why certain code properties, such as trifference, cannot be fully captured in tree or word MSO, and why exact counting must be imported as a logical primitive (Costa et al., 16 Sep 2025).
Transductions and query optimization: The parameterization result clarifies that output tuple structure (for queries and transductions) can, under boundedness hypotheses, be definably funneled through a small number of anchors or "parameter" positions. This has both descriptive and computational ramifications for automata-based transformations and canonical representations (Nguyên et al., 6 Dec 2025).
Descriptive complexity and decision procedures: Knowing precisely which $k$ -ary queries become definable under which logical extensions enables the design of optimal logical fragments for specification, verification, and synthesis systems, and determines the necessity (or redundancy) of counting features.

The phenomena observed for $k$ -ary queries generalize in several directions:

Pairwise distinctness and rainbow witnesses: General $k$ -ary relations demanding existence of coordinates with $k$ -wise distinct values evade MSO expressibility by the same mechanism as $(b,k)$ -hashing (Costa et al., 16 Sep 2025). These patterns encompass "rainbow" positions for colorings and generalizations of mutual distinctness constraints.
Extensions and minimality: MSO extended by equi-cardinality ( $\mathrm{eqcard}$ ), CMSO (modulo counting), or full Presburger arithmetic are precisely the logical settings needed. The minimality result—that equi-cardinality is both necessary and sufficient for $k$ -hashing—sharpens the landscape of definability.

A plausible implication is that results here demarcate the necessary features for any logical framework intended to capture fine-grained combinatorial or coding-theoretic constraints via logical queries, and guide subsequent research on logics for combinatorial structure analysis.