NormCode: Nominal Coding & AI Planning

Updated 18 December 2025

NormCode is a dual framework that includes a principled numerical encoding method for nominal data, preserving category distinctions via complex-valued representations.
It also defines a semi-formal language for AI planning that enforces strict data isolation to prevent context pollution in multi-step LLM workflows.
Both constructs exhibit rigorous formalism, practical integration with standard algorithms, and empirical validation in machine learning and AI workflow management.

NormCode denotes two distinct constructs in the research literature: (1) a method for numerical coding of nominal data for statistical machine learning (Gniazdowski et al., 2016), and (2) a semi-formal language for context-isolated AI planning supporting robust multi-step LLM orchestration (Guan, 11 Dec 2025). Both are formalized frameworks operating on complex, high-dimensional domains with a focus on structured representations and rigorous data management, each situated in its respective technical discipline.

1. Numerical Coding of Nominal Data (NormCode Method)

The original NormCode method provides a principled transformation of nominal (categorical) attributes into an injective, information-preserving encoding in either $\mathbb{C}^d$ or $\mathbb{R}^{2d}$ , enabling seamless integration with distance- and inner-product-based learning algorithms (Gniazdowski et al., 2016). The core mechanism maps each nominal category $c_j$ (of an attribute $A$ with $k$ categories and record count $N$ ) to a complex-valued code $R_j$ : $R_j = \frac{n_j+1}{2} \exp\left(i \frac{2\pi j}{k}\right)$ where $n_j$ is the empirical frequency of $c_j$ . This code uniquely distinguishes each category (injectivity) and embeds global class frequency as the modulus, preserving the distributional structure.

Key properties:

Information preservation: All category distinctions and aggregate frequencies are recoverable from the code set $\{R_j\}$ .
Inner-product geometry: Codes enable standard Hermitian inner product and Euclidean norm computations, allowing clustering (k-means), nearest neighbor (k-NN), and SVM classifiers to operate directly in the embedded space.
Algebraic linearity: All standard vector operations apply, supporting integration with mixed real-nominal datasets.

The transformation algorithm consists of module and phase assignment for each category, vector formation across attributes, and optional conversion to real-valued split representation. Empirical evaluation on a benchmark car-sales dataset (with 2 numerical and 5 nominal features) showed superior classification accuracy for NormCode-only and hybrid (numerical + NormCode) representations compared to ad-hoc integer encodings, with over 90% threshold accuracy in several random k-means runs (summarized in the following table):

Features	Runs ≥90%	Runs ≥80%	Runs ≥70%	Runs ≥60%	Runs ≥50%
Ad hoc nominal + numerical	–	–	–	–	1
Numerical only	–	–	12	1	7
Only NormCode nominal	1	8	3	8	–
Numerical + NormCode	3	4	10	2	1

Computationally, the method exhibits linear scaling with sample size $N$ and attribute count $q$ , and is numerically robust for typical datasets. For very high-cardinality attributes, practitioners may need to aggregate rare categories or introduce smoothing.

2. The NormCode Semi-Formal Language for AI Planning

The NormCode language for AI workflows centers on the elimination of context pollution in multi-step LLM pipelines by enforcing strict data isolation across inference steps (Guan, 11 Dec 2025). Each inference step operates solely on explicitly declared inputs, and the global context window never expands automatically as steps are composed.

The language is realized in three mutually isomorphic formats:

.ncds ("Draft Straightforward"): Human-authorable, lightly indexed plans.
.ncd ("Formal Draft"): Fully resolved, machine-executable representations with explicit flow/value indices.
.ncn: Natural language paraphrase for human verification.

No semantic content is lost across formats; translations are lossless. The language’s grammar, while not formalized in BNF, includes constructs for output assignment, operation characterization (semantic or syntactic), input referencing, quantifiers, and explicit data flow.

NormCode supports a strict division between:

Syntactic operations: Pure, deterministic data restructuring ($0$-token, $100\%$ -reliable).
Semantic operations: Nondeterministic, LLM-driven reasoning steps (token-billed, associated with local success/failure metrics).

The orchestrator executes NormCode plans by dependency-driven scheduling over a global waitlist, checkpointing every step to a SQLite database for full traceability and rollback. Looping and quantification generate new inferences dynamically, and flow indices guarantee branch independence and reproducibility.

3. Formal Structure and Execution Semantics

The core formalism distinguishes two execution regimes for inferences, as described by the following relations: $\frac { \forall i.\; \Gamma(r_i) = v_i \quad \text{op is syntactic with } g } { \Gamma \vdash o \leftarrow_{\mathrm{syn}\;g(r_1,\ldots,r_n)} \longrightarrow \Gamma[o \mapsto g(v_1,\ldots,v_n)] }$

$\frac { \forall i.\; \Gamma(r_i) = v_i \quad \text{op is semantic with } f } { \Gamma \vdash o \leftarrow_{\mathrm{sem}\;f(r_1,\ldots,r_n)} \longrightarrow \Gamma[o \mapsto \mathrm{LLM}(f, v_1, \ldots, v_n)] }$

Here, $\Gamma$ is the store mapping concept references to data values, and operations apply either pure functions $g$ or LLM invocations parameterized by function concepts $f$ . This model provides precise cost, data provenance, and reliability accounting at each step.

Isomorphic plan representations (.ncds, .ncd, .ncn) enable progressive formalization and human-in-the-loop verification throughout authoring, compilation, and deployment.

4. Representative Case Studies and Empirical Validation

Demonstrations within (Guan, 11 Dec 2025) include:

Base-X Addition Algorithm: Fully pipelined plan for digit-wise addition with carry, implemented using both deterministic tensor logic (iteration, data splitting) and localized semantic LLM steps ("sum these digits + carry"). Achieves $100\%$ accuracy over arbitrary-length inputs, with 25 semantic LLM steps per iteration, independent of input length.
Self-Hosted Compiler Pipeline: Five-phase code generation workflow—from confirmation, hierarchical deconstruction, formalization, contextualization, to materialization—written and executed entirely in NormCode. The pipeline produces all three plan representations, input/output manifests, and is fully auditable. End-to-end compilation is completed in approximately two minutes with ten semantic LLM calls and about 50 inferences.

Auditability is maintained by persistent indexing (flow indices) and logging of all LLM calls, prompts, outputs, and data dependencies.

5. Applications, Limitations, and Prospective Enhancements

The NormCode language is positioned for contexts requiring regulatory audit, traceability, and robustness:

Legal reasoning: Statutory interpretation, contract review, and audit trails with enforced data isolation.
Medical planning: Step-wise diagnosis and treatment recommendations with deterministic/supervised LLM boundaries.
Financial analysis: Staged forecasting, risk assessment, and backtestable trace capture.

Limitations acknowledged in the primary literature:

Syntax density in fully formalized (.ncd) plans; manual editing risks index corruption (thus edits should be performed via .ncds + recompile).
Verbosity, as explicit data-flow expands even simple tasks.
Brittleness in the deconstruction phase (NL-to-.ncd), which remains LLM-dependent.

Planned future work includes: IDE support with structure-aware visualization, fine-tuned deconstruction models for robustness, multi-agent coordination primitives, and domain-specific semantic typing. Quantitative user studies comparing NormCode with direct prompting and other agent frameworks (e.g., LangChain, AutoGPT) on benchmarks such as HumanEval and GAIA are proposed as evaluation priorities.

6. Distinctions and Relationship to Other “NormCode” Usages

The NormCode nomenclature also arises in the context of Norm-Trace codes in algebraic geometry and coding theory (Bartoli et al., 2022), but this is unrelated to the numerical coding method or the AI planning language. In that setting, “Norm” refers to field-theoretic norm maps and trace functions on algebraic curves, with a focus on evaluation codes, weight distributions, and minimal codewords, rather than data representation or workflow planning.

7. Summary

NormCode (as numerical coder) and NormCode (as a context-isolated AI planning language) are both characterized by formal discipline, information preservation, and compositional integration with larger data or inference structures. The former advances categorical feature incorporation for traditional ML algorithms, while the latter provides an auditable, formally specified platform for multi-step LLM workflows, addressing the critical problem of context pollution in high-stakes AI applications. Both frameworks exhibit rigorous separation of concerns and are positioned as foundational building-blocks for their respective disciplines (Gniazdowski et al., 2016, Guan, 11 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (3)

Numerical Coding of Nominal Data (2016)

NormCode: A Semi-Formal Language for Context-Isolated AI Planning (2025)

Minimal codewords in Norm-Trace codes (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to NormCode.