SymSeqBench: Unified Symbolic Sequence Benchmark
- SymSeqBench is a unified open-source framework for generating, analyzing, and evaluating rule-based symbolic sequences, leveraging formal language theory and systematic complexity control.
- It integrates modular Python packages for grammar inference, sequence synthesis, and metric computation, standardizing benchmarks across cognitive science, AI, neuromorphic, and behavioral research.
- Its design supports reproducible research through YAML configurations, PyTorch integration, and extensibility for custom tasks, embeddings, and experimental paradigms.
SymSeqBench is a unified open-source framework and benchmark suite for the generation, analysis, and evaluation of sequences produced by rule-based symbolic systems. It enables formal specification, systematic complexity control, and rigorous assessment of symbolic sequence processing across cognitive, AI, neuromorphic, and behavioral domains, providing direct connections to formal language theory (FLT) and computational models of sequential structure (Zajzon et al., 31 Dec 2025).
1. Theoretical Foundations: Formal Language Structures
SymSeqBench operationalizes constructs from formal language theory. The foundational elements are:
- Alphabets and Words: An alphabet is a finite set of symbols; words are sequences over ; languages are sets of such words.
- Grammar Classes: The principal grammar type is the regular grammar (Type-3, Chomsky hierarchy), which can be specified by right-linear rules or deterministic finite automata (DFAs) , where is a finite state set, is a transition function, the start state, and accepting states. Context-free grammars are also supported at the analytical level (Zajzon et al., 31 Dec 2025).
- Language Generation and Induction: SymSeqBench’s generators use regular grammars for tractable synthesis and analysis, leveraging the Myhill–Nerode theorem for minimal DFA construction. Minimal-DFA inference is NP-hard; Gold's theorem constrains learnability from positive data.
- Complexity and Computability: Topological entropy (TE) quantifies grammar complexity as , with the DFA's adjacency matrix and its principal eigenvalue.
2. Framework Architecture and Core Components
SymSeqBench consists of two main modular Python packages:
- SymSeq: Provides grammar inference, sequence generation, programmatic symbolic analysis, and metric computation.
- Parser: Learns first-order transition models from discrete strings or converts continuous time series via Symbolic Aggregate Approximation (SAX).
- Language Generators: Include canonical grammars (Reber, Dyck-1, random DFA), cognitive paradigms (n-back, 12AX, bracket-completion), and user-specified parameterized grammars.
- Metrics Module: Computes syntactic and information-theoretic metrics at token, string, set, and grammar levels, including entropy measures, Lempel–Ziv complexity, normalized compression distance (NCD), and TE.
- Tasks: Recognition (legal word or next-symbol prediction), transduction (e.g., reverse, odds-first sorting), and intrusion (drawn from cognitive paradigms).
- SeqWrapper: Configuration-based orchestration of generator, parser, and task modules for streamlined dataset creation and analysis.
- SeqBench: Constructs datasets, applies token/embedding transformations, and implements ML integration.
- DatasetGenerator: Controls parallel or single-threaded symbolic sequence creation, retains generative grammar and metadata for reproducibility.
- SeqDataset: Implements PyTorch dataset interface, applies embedding mappings (one-hot vectors, learned, or dataset-based like MNIST/GSC), supports perturbation injection (noise, temporal jitter).
- Symbolic Embedding Pipeline: Supports audio (MFCCs), vision (CNNs, spatial transforms), and spike encoding for neuromorphic benchmarking.
- Analysis Utilities: Provides embedding rank, signal-to-noise, geometric and distance metrics.
3. Sequence Synthesis and Analysis Methodologies
- Sequence Generation: Core logic samples from a regular grammar’s stochastic transition graph, with user-specified sequence length and symbol-level noise injection.
- The generation pseudocode iteratively transitions through the DFA, emitting symbols and optionally introducing random noise (explicitly detailed in the documentation).
- Complexity-Guided Synthesis: Grammar sampling can be targeted for a prescribed TE by ERGM+Glauber dynamics, controlling for out-degree and other graph statistics.
- Metric Computation: Metrics are organized by granularity:
- Token-level: empirical frequencies, -gram distribution.
- String-level: Shannon entropy, Lempel–Ziv complexity, block entropy.
- Set-level: NCD, associative chunk strength (ACS), mean edit distance.
- Grammar-level: TE, Markov order (BIC), mutual information decay.
- Workflow:
- Data: YAML or dictionary configuration defines generators/tasks.
- Generation: Single script call produces train/test splits, storing run information for reproducibility.
- Analysis: Grammar and sequence metrics can be computed on demand programmatically.
4. Benchmark Suite and Evaluation Protocols
SeqBench encapsulates a diverse battery of cognitive, neural, and machine learning benchmarks:
- Experimental Psycholinguistics: Artificial grammar learning (AGL), non-adjacent dependencies (NAD), balanced positive/negative string sets, metrics such as balanced accuracy and Cohen’s κ.
- Cognitive Psychology: n-back and DMS paradigms, sequential probe tasks.
- Behavioral Analysis: Discretization of ethograms, assessment of sequence complexity in real animal behavior.
- Neuromorphic Computing: Spiking network benchmarks (Seq SHD/SSC/GSC); SNNs, LIF/adLIF models evaluated alongside ANNs (GRU, Transformer, Mamba), performance assessed as a function of sequence entropy and Markov order.
- Artificial Intelligence: Next-symbol prediction, transduction (copy/reverse), bracket-completion.
- Evaluation Metrics: Accuracy, Cohen’s κ (for chance-corrected assessment), area under learning curve (AUC), and perplexity.
The benchmark distribution covers a broad complexity regime by sampling grammars at controlled TE and state-space densities, enabling robust stress-testing of learning models against discretely parameterized sequence statistics.
5. Implementation, Extensibility, and Usage Patterns
- Dependencies: Requires Python ≥3.7, torch ≥1.10, and auxiliary packages (yaml, saxpy, etc.). Optional support for NEST (spiking simulation), NumPy, SciPy, pandas.
- User Interface: YAML or direct API for configuration; high-level wrapper (SeqWrapper) exposes standard interfaces for data, grammar access, and analysis.
- Machine Learning Integration: Exports data as PyTorch datasets. Supports direct batching and perturbation via torch DataLoader.
- Extensibility:
- Grammar extension: Subclass RegularGrammar, implement specific transitions.
- Task addition: Register in the Tasks module.
- Custom embeddings: Implement Embedding class with
.encode(symbol)interface. - Community contribution: Open development at https://github.com/symseqbench/symseq and https://github.com/symseqbench/seqbench.
- API Patterns: Includes setup, batch data generation, and evaluation loop samples, supporting rapid prototyping and reproducibility.
6. Applications and Impact Across Disciplines
SymSeqBench facilitates experimental, computational, and theoretical advances across domains:
- Psycholinguistics: Enables generation of parametric AGL datasets, systematic manipulation of grammatical structure and chunk strength, and stratified sampling by linguistic statistics.
- Neuroscience and Neuromorphics: Used for benchmarking SNN and reservoir models on sequence memory, context resolution, with comparative studies showing performance scaling with TE.
- Behavioral and Computational Neuroethology: Applied to real animal ethograms; multi-scale information metrics reveal syntactic complexity and superregular structures not accessible from token-level analysis alone.
- AI/ML: Provides challenging, parameterizable sequence-learning tasks for RNNs, transformers, and neuromorphic models, with standardized evaluation metrics and complexity-controlled task generation.
- Empirical Demonstrations: TE-targeted sampling produces controllably complex grammars (documented in benchmark figures); analysis pipelines demonstrate factor-driven dataset construction; FLT-grounded metrics facilitate quantification of learning and representational complexity.
SymSeqBench's standardized interfaces, modular architecture, and formal grounding in FLT establish a rigorous computational framework enabling reproducible experimentation and model evaluation across cognitive science, machine learning, neuromorphic engineering, and behavioral analysis (Zajzon et al., 31 Dec 2025).
7. Relation to Inductive Theorem Proving (OEIS-Derived SymSeqBench)
In a separate context, "SymSeqBench" also refers to a suite of 29,687 mathematical program equivalence problems derived from OEIS sequences, designed to benchmark inductive theorem provers (Gauthier et al., 2023). Each benchmark specifies the equality of two nontrivially distinct programs (in a custom sequence-generating language) for all natural numbers: Benchmarks are synthetically generated to challenge arithmetic reasoning and inductive automation, with deep reliance on numeric and mutual induction. Problem complexity ranges from trivial to requiring multi-step inductive reasoning often beyond existing automation capabilities. The benchmark's structure, program language, and evaluation partitioning are detailed in the original reference (Gauthier et al., 2023).
SymSeqBench thus denotes both a formal-sequence synthetic framework for cognitive/AI benchmarking (Zajzon et al., 31 Dec 2025) and a specialized mathematical equivalence benchmark for inductive reasoning systems (Gauthier et al., 2023), with both usages unified by systematic control of sequential symbolic structure and rigorous automated evaluation.