Papers
Topics
Authors
Recent
2000 character limit reached

Codified Decision Trees for Agent Behavior

Updated 4 February 2026
  • CDT is a hierarchical, explicitly interpretable decision structure that models narrative agent behavior through scene-conditioned rules.
  • It is induced from scene-action pairs using clustering, LLM-driven hypothesis generation, and rigorous NLI-based validation.
  • Empirical results show CDT and CDT-Lite outperform traditional methods by ensuring deterministic, transparent, and robust agent behavior.

A Codified Decision Tree (CDT) is an explicitly interpretable, executable decision structure for encoding behavioral profiles of agents, particularly in narrative or role-playing (RP) environments. Unlike traditional, static hand-authored profiles, CDT is constructed via a data-driven induction process over (scene, action) pairs, yielding a hierarchical tree whose branches are labeled by validated scene-conditioned predicates and whose leaves comprise grounded behavioral statements. This framework supports deterministic inference, rigorous validation, and transparent inspection, resulting in robust agent consistency across diverse contexts (Peng et al., 15 Jan 2026).

1. Formal Definition and Structure

A CDT for a character xx is a rooted tree TT whose nodes vv hold two kinds of content:

  • A (possibly empty) set HvH_v of behavioral statements hAh \in A, where AA is the set of grounded action statements.
  • A (possibly empty) set of outgoing edges (vvi)(v \to v_i), each labeled by a predicate-question qiq_i on scene descriptions.

Given the space SS of all textual scenes and a binary discriminator function check(s,q){True,False,Unknown}\mathrm{check}(s,q) \in \{ \mathrm{True}, \mathrm{False}, \mathrm{Unknown} \}, the execution (inference) semantics for a scene s0s_0 are:

  1. Initialize grounding set GHrootG \gets H_{\mathrm{root}}.
  2. For each outgoing edge (rootvi)(\mathrm{root} \to v_i) labeled by qiq_i, if check(s0,qi)=True\mathrm{check}(s_0, q_i)=\mathrm{True}, update GGHviG \gets G \cup H_{v_i} and recurse on viv_i.
  3. The output G(s0)G(s_0) is the union of all HvH_v for those vv whose path from root satisfies every traversed qiq_i.

Each edge predicate qq formalizes a rule antecedent ("if CC then..."), and each hHvh \in H_v is a rule consequent ("...then AA"). A rule RR is the pair (CA)(C \to A), where C(s)=iI[check(s,qi)=True]C(s) = \bigwedge_{i \in I}[\mathrm{check}(s, q_i) = \mathrm{True}] and AA is drawn from HvH_v.

2. Learning Algorithm and Induction Process

CDT is induced from a dataset DD of (scene s,action a)(\text{scene } s, \text{action } a) pairs using the following recursive algorithm:

  • Clustering: Similar (s,a)(s,a) pairs are grouped (e.g., via semantic embedding or clustering).
  • Hypothesis Generation: For each cluster, a LLM is prompted to propose candidate (q,h)(q, h) pairs, where qq is a predicate applicable to ss and hh a behavioral action.
  • Validation: Each hypothesis is evaluated on DD using NLI-style statistics:
    • rer_e: number where NLI(ah)=entail(a \to h)=\text{entail}
    • rcr_c: number where NLI(ah)=contradict(a \to h)=\text{contradict}
    • rn=Drercr_n = |D| - r_e - r_c
    • acc=re/(re+rc)\mathrm{acc} = r_e /(r_e + r_c) (entail-accuracy)
    • app=re/(re+rn+rc)\mathrm{app} = r_e /(r_e + r_n + r_c) (applicability)
  • Acceptance/Rejection/Refinement:
    • If accθaccacc \geq \theta_{acc}, accept as rule;
    • If accθrejacc \leq \theta_{rej} or D|D'| small, reject;
    • If fracθffrac \leq \theta_f and depth <dmax< d_{max}, recurse for further specialization.
  • Termination Criteria: The process stops when no further refinement is warranted.

Key hyperparameters include:

  • θacc\theta_{acc} (acceptance, e.g., 0.75)
  • θrej\theta_{rej} (rejection, e.g., 0.50)
  • θf\theta_f (filter, e.g., 0.75)
  • dmaxd_{max} (maximum depth)
  • minD\min|D'| for recursion (e.g., 16)

3. Executability and Interpretability

CDT nodes store explicit, human-readable behavioral statements, and all branch predicates are labeled linguistically interpretable questions. Deterministic retrieval is guaranteed, as check(s,q)\mathrm{check}(s, q) is a deterministic Boolean test (with an Unknown\toFalse policy). The result is that repeated queries on the same scene ss yield identical traversals and triggered behavioral actions.

Termination and decidability are ensured by constraints on both maximum tree depth and recursion dataset size. The construction guarantees that for any finite dataset, the induced CDT is finite and construction halts in O(dmaxD)O(d_{max}\cdot|D|) steps (Peng et al., 15 Jan 2026).

4. Empirical Results and Benchmarks

CDT and its variant CDT-Lite were evaluated on several benchmarks:

  • Datasets:
    • Fine-grained Fandom: 8 artifacts, 45 characters, 20,778 (s,a)(s,a) pairs.
    • Bandori Conversational: 8 bands, 40 characters, 7,866 pairs.
    • Bandori Events: 77,182 pairs (scaling study).
  • Metric: Natural language inference (NLI) score. Given a predicted action a^\hat{a} and reference aa, score(a^,a)=100(\hat{a},a)=100 if entail, $50$ if neutral, and $0$ if contradict; average is reported.
  • Key Results (NLI Score Average):
System Fandom Avg Bandori Avg
Vanilla 55.6 65.5
Fine-tune 45.7 62.9
RICL 56.0 68.9
ETA 56.9 72.3
Human 58.3 71.3
Codified-Human 59.3 71.9
CDT 60.8 77.7
CDT-Lite 61.0 79.0

Removal of clustering, instruction-following embeddings, or validation degrades performance by 1–2 points. Performance scales monotonically with dataset size (Peng et al., 15 Jan 2026).

5. Example Construction

Consider the following illustrative dataset DD for a "Hero":

  1. "Dark tunnel ahead..." \to "Hero lights torch."
  2. "Walls glint in darkness..." \to "Hero lights torch."
  3. "Monster roar nearby..." \to "Hero draws sword."
  • Cluster (1,2)(1,2); LLM hypothesizes q1=q_1="Does the scene mention darkness?", h1=h_1="Hero lights torch." Accepted as acc=1.0acc=1.0.
  • Cluster (3)(3); LLM hypothesizes q2=q_2="Does the scene indicate presence of hostile creature?", h2=h_2="Hero draws sword." Accepted.

Final CDT in LaTeX: root: H= {[q1:“mention darkness?”]v1,  Hv1={“Hero lights torch.”} [q2:“hostile creature?”]v2,  Hv2={“Hero draws sword.”}\begin{array}{l} \text{root: }H=\emptyset\ \quad\begin{cases} [q_1:\text{“mention darkness?”}]\to v_1,\;H_{v_1}=\{\text{“Hero lights torch.”}\}\ [q_2:\text{“hostile creature?”}]\to v_2,\;H_{v_2}=\{\text{“Hero draws sword.”}\} \end{cases} \end{array}

CDT offers improvements over both hand-authored codified human profiles and other induction methods. For Fandom, CDT-Lite outperforms Codified Human by +1.7 points (61.0 vs 59.3 NLI avg); for Bandori, by +7.1 points (79.0 vs 71.9). Overall, CDTs show relative improvements of 3–10% over the strongest human and prior data-driven baselines (Peng et al., 15 Jan 2026).

While CDT leverages a tree structure reminiscent of classic decision trees, the construction and inference are semantically adapted to natural language scene affordances and behavioral logic, not feature-threshold predicates. By contrast, computational graph representations of traditional binary and oblique decision trees have been formalized via parallel predicate evaluation and bitvector arithmetic over structured inputs, supporting soft traversals and hybridization with differentiable models (Zhang, 2021). CDTs focus distinctly on context-conditional action logic derived from narrative data rather than numerical features.

7. Limitations and Future Developments

Current CDT methodology is restricted to offline (non-continual) construction and induction solely from narrative storyline data, without leveraging canonical trait priors or multimodal context (e.g., game state). Future directions include:

  • Joint CDT induction for multiple interacting characters.
  • Online refinement and continual learning from live agent interaction.
  • Multimodal CDT expansion incorporating event logs and real-time state signals.

These directions address domains where principled, interpretable, and efficiently updatable behavioral logic is required for robust agent grounding under complex, evolving contexts (Peng et al., 15 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Codified Decision Tree (CDT).