CategoryScienceClaw Framework
- CategoryScienceClaw is a category-theoretic framework that formalizes scientific discovery as regime transitions over structured categories encoding artifact types and tool signatures.
- It employs functorial artifact transport and Kan extensions to integrate legacy data with novel discoveries, ensuring precise, auditable provenance.
- The approach is exemplified in fiber-network mechanics, where it distinctly separates retrieval, search, and innovation within a self-revising, proof-carrying system.
CategoryScienceClaw encompasses a category-theoretic framework and a concrete system for formalizing, auditably tracking, and self-revising the structure of scientific discovery in computational research. It provides a mathematical and engineering language for representing scientific workflows as dynamically evolving regimes—where each regime is a category encoding artifact types, tool signatures, and validation gates—and for specifying discovery as a regime transition, rather than as mere generation of answers. The framework has been instantiated in exemplar domains, notably fiber-network mechanics, where all types, tool chains, provenance links, verdicts, and regime extensions are explicitly structured and audited. This approach separates retrieval, search, and discovery phases without recourse to subjective notions of novelty, supporting the development of robust, proof-carrying, and self-revising AI discovery systems (Wang et al., 31 May 2026).
1. Categorical Regime Structure: Schemas and Transitions
Each scientific "regime" in CategoryScienceClaw is defined by a small category , whose objects model artifact types (data, models, descriptors, explanatory surrogates) and whose morphisms capture the signatures of skills or computational tools. For example, in a fiber-network mechanics scenario, includes objects such as FiberNetwork, FiberCount, StressStrainData, and morphisms such as
A regime transition is defined by a functor that extends the previous schema: for instance, when new physical phenomena (e.g., orientation-dependent anisotropy) are discovered, the schema is enlarged with objects such as OrientationTensor and AnisoSurrogate, and new morphisms corresponding to computation and validation steps for these entities. The functor acts as an inclusion of into , transparently transporting legacy types and operations while exposing the locus of true discovery in the region (Wang et al., 31 May 2026).
2. State Representation, Provenance, and Artifact Lineage
At each time 0, the system state is given by a copresheaf 1, mapping every object to the finite set of accepted artifacts of that type, and every morphism to the artifact-level relation induced by tool execution. This functorial structure ensures all provenance is tracked: the “category of elements” 2 recovers the full artifact-lineage DAG, where each object is a pair 3 for 4, and morphisms 5 are mediated by 6 with 7.
This construction yields a globally precise and queryable provenance trace, essential for both retrieval (locating and reusing prior artifacts) and for auditing the exact action of previously applied tools and skills within the historical context of a regime (Wang et al., 31 May 2026).
3. Kan Extension and the Semantics of Regime Transitions
Discovery, in this formalism, is not simply the addition of new results, but the verified regime transition from 8 to 9. The mathematical mechanism for transporting old artifacts into the new schema is the left Kan extension 0 along 1. Explicitly, for each 2: 3 This ensures that all “old” artifacts which can be coherently assigned new meanings under the extended schema are automatically inherited, while genuinely new artifacts (arising in objects not reachable via 4) are precisely those representing discovery beyond functorial transport. The universal property of this extension establishes a canonical comparison between the transported state and the new, post-transition state 5 (Wang et al., 31 May 2026).
4. Proof-Carrying Knowledge-Computation Graph
CategoryScienceClaw augments the copresheaf structure with verification, discourse, and publication: artifacts are validated by gates/verifiers 6, which may encode statistical criteria (e.g., AIC, MDL) or scientific accept/reject logic; discourse is modeled as a category 7 tracking claims, objections, and citations; and the publication functor 8 connects internal provenance with public knowledge.
Open needs are explicit holes in the provenance/category, corresponding to missing artifact targets waiting to be filled; workflow mutation is functorial update within 9, supporting both stochastic and policy-driven search. Stress tests and public auditability of all verdicts and model selection are formalized as committed artifact branches and verdicts, recorded at every regime update (Wang et al., 31 May 2026).
5. Fiber-Network Mechanics Example: Mechanization of Discovery
In the canonical fiber-network example, two types of surrogates are proposed for tensile stiffness:
- 0: isotropic descriptor 1
- 2: anisotropic surrogate 3, where 4 is the leading eigenvector of the orientation tensor 5
The first, 6, is scored by an AIC gate and rejected, remaining in the provenance graph as a contrast artifact. The regime is then extended to admit orientation tensors and the more expressive surrogate 7, which upon evidence accumulation and stress testing is accepted when its AIC score is superior and residual diagnostics pass. The Kan extension action shows that 8 is genuinely “novel” in the regime-theoretic sense: no functorial transport from 9 to 0 could recover 1 from legacy artifacts. All discoveries, rejected alternatives, and the exact justification for regime mutation are auditable in the proof-carrying graph (Wang et al., 31 May 2026).
6. Retrieval, Search, and Verified Discovery
CategoryScienceClaw makes a sharp technical distinction:
- Retrieval is the addition of artifacts already performable within 2.
- Search is endofunctorial iteration and refinement, hill-climbing artifact states via functorial update 3.
- Discovery is a regime transition—a structural schema extension and audit, with the difference between 4 and 5 quantifying genuine content added.
This sequence is formalized: any regime upgrade is paired not only with artifact inheritance via Kan extension but with a regime audit via the comparison map 6, with residuals (those artifacts in 7 not in the image of 8) defining the scope of discovery.
7. Implications, Significance, and Engineering Practice
CategoryScienceClaw establishes a mathematically precise, proof-carrying, and dynamically self-revising foundation for agentic scientific discovery. It enables:
- Explicit control and auditable tracking of every skill/tool, artifact, verifier, regime extension, and public claim.
- Mechanized separation of legacy (retrieval), regime-internal optimization (search), and innovation (verified discovery).
- Rigorous support for open-ended but accountable evolution of computational science, without requiring arbitrary numeric "novelty" scores.
A plausible implication is that as scientific workflows and verification processes become increasingly machine-mediated, category-theoretic regime tracking and functorial artifact transport will underpin scalable, trustworthy, and self-updating agentic scientific platforms (Wang et al., 31 May 2026).