CategoryScienceClaw System
- CategoryScienceClaw is a categorical knowledge–computation graph architecture that encodes scientific discovery as a self-revising process using category theory.
- It represents research workflows as typed morphisms and objects, enabling rigorous provenance tracking, automated verification, and evidence generation.
- The system distinctly separates retrieval, search, and genuine discovery through functorial transport and regime transitions, ensuring extensibility and reproducibility.
CategoryScienceClaw is a categorical knowledge–computation graph architecture designed to formalize, audit, and mechanize scientific discovery as a self-revising process. Drawing from category theory, CategoryScienceClaw encodes every state in a research workflow—including skills, artifacts, provenance, workflow mutations, needs, verification gates, and discourse—as typed morphisms and objects in a schema category. By separating retrieval, search, and true discovery via regime transitions and functorial transport, it enables AI systems to generate and certify not just answers but genuinely new types of evidence, hypotheses, and operational workflows, backed by rigorous provenance and gatekeeping.
1. Categorical Foundations
Central to CategoryScienceClaw is the formalization of a scientific “regime” as a tuple , where:
- is a small category (the schema) whose objects are artifact types (e.g., FiberNetwork, OrientationTensor) and whose morphisms are skill signatures (e.g., computeOrientation: FiberNetwork OrientationTensor).
- is a grammar over encoding workflow composition.
- is a gate or verifier predicate on copresheaves (states), e.g., an AIC threshold for model selection.
- is an optional description-length functional (MDL/AIC).
At each timestep , the system state is a copresheaf ; for each type , 0 is the set of accepted artifacts of type 1. Morphisms 2 are realized as 3. The full provenance graph is the category of elements 4, whose objects are 5 with 6 and whose morphisms are 7 whenever 8.
Discovery, defined categorically, is a verified regime transition 9, in which existing artifacts are functorially transported via the left Kan extension 0; the system defines residual content at new types as that which cannot be generated by transport alone. The gate 1 certifies the new state post-transition. This approach distinctly separates retrieval (in-schema artifact addition), search (endofunctorial update), and genuine discovery (schema extension with residual content) (Wang et al., 31 May 2026).
2. Knowledge–Computation Graph Architecture
CategoryScienceClaw is instantiated as an executable, proof-carrying knowledge–computation graph over the ScienceClaw execution substrate and the Infinite discourse substrate:
- Typed skills become schema morphisms; a registry of tools or operations is encoded as 2.
- Immutable artifacts are fibers of the copresheaf: for each artifact 3, 4 so 5.
- Provenance is encoded multicategorically: every artifact 6 stores its parent tuple 7 and the skill 8. This yields colored-operad edges (multi-parent), while supersetting unary categories of elements.
- Open needs are explicit typed holes (unfilled objects or cones in the provenance graph); the ArtifactReactor component proposes completions based on schema overlap.
- Workflow mutation is formalized as copresheaf refinements 9, injectively embedding old artifacts as superseded or inactive. These refinements are natural transformations, and they canonically lift to categorical functors 0.
- Verification gates and stress tests are functors or predicates on copresheaves. Gates such as AIC, MDL, or domain-specific criteria decide regime transitions. Stress tests are evidence-generating skill calls that trigger reevaluation.
- Public discourse is modeled as a category 1 of claims, posts, and replications, with a publication functor 2 translating provenance into discourse. Comments, votes, and reputation are expressed as morphisms or functors over 3.
The global categorical state at 4 is 5, providing a complete, audit-ready system snapshot (Wang et al., 31 May 2026).
3. Discovery as Verified Regime Transition
CategoryScienceClaw formalizes discovery as a verified regime transition 6:
- 7 is a functor extending or transforming the category of types/operations—e.g., by adding new artifact types for accepted model surrogates.
- 8 is a componentwise injective natural transformation (restriction along 9) that preserves old provenance in the new state.
- The left Kan extension 0 functorially transports old artifacts into the expanded schema. For each new type 1, 2 is the colimit over all sources 3.
- The residual content at 4 is 5, i.e., new accepted artifacts not derivable by transport from the old regime. This residual is the mathematically certified “new knowledge”.
Gates are reapplied both to the transported substate and to the aggregate new state to certify correctness and novelty. Only regime transitions with nontrivial residuals constitute genuine scientific discovery (Wang et al., 31 May 2026).
4. Example: Fiber-Network Mechanics Run
The paper provides a detailed example in fiber-network mechanics:
- The schema 6 comprises types FiberNetwork, OrientationTensor, StrainData, StressData, Model0 (isotropic fiber count), Model1 (orientation-tensor anisotropic surrogate), AICRecord, AcceptedModel, RejectedModel, PerturbationTest, FigureReport.
- Morphisms include computeOrientation: FiberNetwork 7 OrientationTensor, proposeModels: (OrientationTensor, StressFit) 8 (Model0, Model1), AICgate: (Model0, Model1) 9 AICRecord.
- The orientation-tensor surrogate is parametrized as:
0
with anisotropy 1, nematic order 2, and a linear stress–strain surrogate 3 fit to held-out data (4 kPa, 5).
- Model selection is via a gate: 6 iff AIC(Model1) 7 AIC(Model0).
- The new schema 8 adds types for Model1, AICRecord, AcceptedModel, etc. The Kan extension transports old artifacts; accepted surrogates, parameter fits, and new gate records populate the residual.
- A final morphism synthesizeFigure:(AcceptedModel, PerturbationTest) 9 FigureReport encapsulates the reporting step, all with persistent provenance.
This structure allows every scientific decision—hypothesis, modeling step, gate crossing, discourse artifact—to be represented, audited, and transported across discovery regimes (Wang et al., 31 May 2026).
5. Separation of Retrieval, Search, and Discovery
CategoryScienceClaw formally distinguishes:
- Retrieval: addition of already-typed artifacts within the same schema—no new types or skills.
- Search: iterative application of an endofunctor 0 within 1; provenance and type system are preserved.
- Discovery: only achieved via verified regime extension 2 with non-empty residual content at genuinely new types; this strictly demarcates the generation of previously unreachable artifact classes and new scientific structure.
This separation, grounded in categorical transport, eliminates subjective novelty criteria and anchors revision and knowledge gain in structural regime extensions (Wang et al., 31 May 2026).
6. Implications, Engineering Properties, and Extendability
The CategoryScienceClaw approach provides:
- Category-theoretic rigor: All states, transitions, and computational consequences are explicit objects and morphisms, supporting automatic audit and discoverability.
- Extensibility: New tools, models, and evaluation gates are simply new types or morphisms in 3.
- Discourse integration: The publication/discussion category 4, with its functorial link to provenance, enables public claims, objections, and replication within the same formal graph.
- Proof-carrying execution: All workflow runs, model selections, and reporting steps are inherent proofs in the categorical data structure.
- Self-revision: Scientific progress occurs not just as answer or artifact generation, but as regime-level schema augmentation with certified residual content.
This framework is agnostic to scientific field and is positioned as both a mathematical language for discovery and a specification for self-revising, agentic AI in science (Wang et al., 31 May 2026).