Chain-of-Layers (CoLa): Adaptive Architectures

Updated 2 July 2026

Chain-of-Layers (CoLa) is a framework that dynamically composes neural network layers and constructs taxonomies through adaptive, layerwise sequences.
It employs methods such as skip/repeat operators and MCTS-based search to tailor model depth per input, enhancing both efficiency and accuracy.
In taxonomy induction, CoLa builds hierarchies iteratively using in-context prompts and ensemble filtering, improving structural coherence and reducing hallucinations.

Chain-of-Layers (CoLa) refers to a family of frameworks that reformulate either neural network inference or taxonomy induction as the dynamic composition or growth of functions or structure in a “layer-wise” manner, driven by optimization or prompting. Two central but distinct lines of research dominate this term: (1) the test-time architectural adaptation of pretrained LLMs via flexible reordering, skipping, or recurrence of their internal layers (Li et al., 10 Jul 2025), and (2) the in-context, iterative construction of taxonomies through structured prompting and layerwise candidate expansion (Zeng et al., 2024). Each application leverages a novel interpretation of “chain-of-layers” to address core challenges in efficiency, adaptability, and structure. The following sections provide a comprehensive technical overview of both paradigms, their formal underpinnings, algorithms, empirical findings, and implications.

1. Formal Definitions and Mathematical Frameworks

Dynamic Layer Sequences in LLMs

Let a pretrained LLM consist of an ordered stack of $N$ layers, denoted $\mathcal{L} = (L_1,\,L_2,\,\dots,\,L_N)$ , where each $L_i$ is a deterministic transformation. The standard forward pass is

$f_{\mathrm{orig}}(x) = L_N \circ L_{N-1} \circ \cdots \circ L_1 (x).$

A chain-of-layers (CoLa) generalizes this to any finite sequence drawn (with repetition) from $\{L_1, \dots, L_N\}$ :

$C = (L_{i_1},\,L_{i_2},\,\dots,\,L_{i_k}) \text{ with } i_j \in \{1, \dots, N\},$

$f_C(x) = L_{i_k} \circ \cdots \circ L_{i_1}(x).$

Elementary “skip” and “repeat” operators alter $C$ by removing or repeating contiguous block(s) of layers, forming a combinatorial edit space. These operators formally model path pruning and local recurrence, allowing adaptation of depth and order per input.

Chain-of-Layer for Taxonomy Induction

In taxonomy induction, “chain-of-layer” (CoL) refers to a sequence of in-context expansions constructing a directed acyclic graph $T = (V, E)$ of “is-a” relations from a flat entity set $V$ . At each iteration $\mathcal{L} = (L_1,\,L_2,\,\dots,\,L_N)$ 0, the model selects a new set of entities $\mathcal{L} = (L_1,\,L_2,\,\dots,\,L_N)$ 1 forming the $\mathcal{L} = (L_1,\,L_2,\,\dots,\,L_N)$ 2-th layer beneath current nodes, then attaches them as children to appropriate parents, expanding $\mathcal{L} = (L_1,\,L_2,\,\dots,\,L_N)$ 3 layer by layer. The process iterates until all entities are placed, producing a progressively deepening hierarchy, not a fixed forward pass (Zeng et al., 2024).

2. Search, Optimization, and Iterative Protocols

MCTS-based CoLa Selection for LLMs

The CoLa search space is exponentially large due to combinatorial skip/repeat edits. CoLa employs Monte Carlo Tree Search (MCTS) to efficiently discover layer sequences that maximize a per-sample reward:

$\mathcal{L} = (L_1,\,L_2,\,\dots,\,L_N)$ 4

where $\mathcal{L} = (L_1,\,L_2,\,\dots,\,L_N)$ 5 is path length, and $\mathcal{L} = (L_1,\,L_2,\,\dots,\,L_N)$ 6 is $\mathcal{L} = (L_1,\,L_2,\,\dots,\,L_N)$ 7 if $\mathcal{L} = (L_1,\,L_2,\,\dots,\,L_N)$ 8 matches the gold answer, $\mathcal{L} = (L_1,\,L_2,\,\dots,\,L_N)$ 9 otherwise. MCTS nodes represent partial or complete sequences, with transitions determined by skip/repeat actions. At each node, UCB-based selection guides exploration:

$L_i$ 0

where $L_i$ 1 is cumulative reward, $L_i$ 2 the visit count, $L_i$ 3 total simulations, and $L_i$ 4 the exploration constant. Rollouts estimate the value of leaves, and rewards are backpropagated to guide search toward sequences on the accuracy-depth Pareto frontier (Li et al., 10 Jul 2025).

Iterative Prompting and Ensemble Filtering in CoL

CoL for taxonomy induction is realized via a staged, top-down expansion protocol:

Layer-wise Candidate Selection (CoL-K): At each layer $L_i$ 5, the LLM receives the current taxonomy $L_i$ 6, remaining entities $L_i$ 7, and demonstration set $L_i$ 8, and is prompted to select appropriate children for each node in $L_i$ 9.
Expansion: The selected entities $f_{\mathrm{orig}}(x) = L_N \circ L_{N-1} \circ \cdots \circ L_1 (x).$ 0 and their assignments form new candidate edges $f_{\mathrm{orig}}(x) = L_N \circ L_{N-1} \circ \cdots \circ L_1 (x).$ 1. Pruning is performed via an ensemble filter.
Ensemble-based Ranking Filter: Masked LLM scoring (using SciBERT) assigns scores to candidate parent–child pairs via a template ensemble, ranking parent candidates by likelihood and retaining only the top-N (typically top-10) per child. Entities removed are returned to the pool for the next layer.

This iterative, filter-augmented process continues until entities are exhausted, ensuring both structural coherence and reduction of hallucinated content (Zeng et al., 2024).

3. Algorithmic Details and Pseudocode

MCTS for Test-Time Architecture Adaptation

Algorithmic steps for MCTS-CoLa include:

Selection: Traverse tree from root by selecting children with the highest UCB.
Expansion: Expand unvisited children via a random valid skip or repeat action, subject to a maximum path length.
Simulation: For expanded leaves, roll out to a full valid sequence and compute $f_{\mathrm{orig}}(x) = L_N \circ L_{N-1} \circ \cdots \circ L_1 (x).$ 2.
Backpropagation: Propagate rollout reward along the traversal path, updating $f_{\mathrm{orig}}(x) = L_N \circ L_{N-1} \circ \cdots \circ L_1 (x).$ 3 and $f_{\mathrm{orig}}(x) = L_N \circ L_{N-1} \circ \cdots \circ L_1 (x).$ 4 for each node.
After $f_{\mathrm{orig}}(x) = L_N \circ L_{N-1} \circ \cdots \circ L_1 (x).$ 5 simulations, output Pareto-optimal $f_{\mathrm{orig}}(x) = L_N \circ L_{N-1} \circ \cdots \circ L_1 (x).$ 6 pairs.

The protocol is concretely instantiated with $f_{\mathrm{orig}}(x) = L_N \circ L_{N-1} \circ \cdots \circ L_1 (x).$ 7 for block manipulation and a tunable depth constraint (Li et al., 10 Jul 2025).

Chain-of-Layer Protocol for Taxonomy

The core loop can be formalized as follows:

Initialize taxonomy $f_{\mathrm{orig}}(x) = L_N \circ L_{N-1} \circ \cdots \circ L_1 (x).$ 8 with seed $f_{\mathrm{orig}}(x) = L_N \circ L_{N-1} \circ \cdots \circ L_1 (x).$ 9; set $\{L_1, \dots, L_N\}$ 0.
For each layer $\{L_1, \dots, L_N\}$ ${L_{1}, \dots, L_{N}}$ 1:
- Invoke CoL-K to select $\{L_1, \dots, L_N\}$ 2 and parent assignments using the in-context prompt.
- Update $\{L_1, \dots, L_N\}$ 3 and remove $\{L_1, \dots, L_N\}$ 4 from $\{L_1, \dots, L_N\}$ 5.
- Apply the EnsembleFilter via Equation (1):
$\{L_1, \dots, L_N\}$ 6

where Sim is inversely proportional to parent ranking in the masked LM. - Retain only high-scoring edges; recycled entities proceed to the next iteration.

Terminate when $\{L_1, \dots, L_N\}$ 7.

The process is elaborated in Algorithm 1 of (Zeng et al., 2024).

4. Empirical Results and Quantitative Analyses

Test-Time Depth Adaptation in LLMs

Across DART-Math (levels 1–5) and commonsense reasoning (ARC-Easy/Challenge), CoLa (MCTS-optimized) yields:

For samples with originally correct predictions, over 75% admit a strictly shorter CoLa maintaining correctness (“C $\{L_1, \dots, L_N\}$ 8C” transitions). Average total depth is reduced by 20–30%.
For samples originally predicted incorrectly, over 60% are correctly resolved by some CoLa, with corrective CoLas often even shorter than C $\{L_1, \dots, L_N\}$ 9C cases (suggesting skipping misleading layers is frequently beneficial).
On LLaMA-3B base and ARC-Easy: original accuracy 27.8%, CoLa accuracy 95.8%; average relative depth is ∼50% (C→C) and ∼45% (W→C) of full forward path.
Few cases are Pareto-optimal under the fixed model, indicating substantial headroom for adaptive architectures (Li et al., 10 Jul 2025).

Taxonomy Induction Benchmarks

On WordNet, Wikipedia, DBLP, and SemEval-Sci benchmarks:

CoL (GPT-4, 5-shot) achieves Edge-F1/A-F1 of 57.73%/79.62% (WordNet), 96.43%/— (Wiki), 47.96%/— (DBLP), and 51.59%/— (SemEval-Sci), outperforming both supervised (Graph2Taxo, CTP) and prior prompting-based (TaxonomyGPT) baselines.
Performance drops as taxonomy size exceeds ~80 entities, exposing LLM context and reasoning limitations.
Ablations confirm necessity of both iterative CoL-K logic (for recall) and ensemble filtering (for precision), with the combination achieving optimal overall F1 (Zeng et al., 2024).

5. Theoretical and Practical Implications

Dynamic Inference and Generalization in LLMs

Fixed-depth forward passes are almost never optimal; CoLa reveals a latent space of input-dependent architectures:

Shallow “fast thinking” CoLas accelerate inference on easy tasks by up to 30%.
Deep, recurrent “slow thinking” paths recover correct predictions for hard or noisy inputs, boosting robustness without finetuning.
The Transformer’s architectural modularity enables layers to act as reusable, composable modules, introducing a new axis of adaptation orthogonal to weight pruning or early-exit strategies.

This suggests a route toward unified dynamic inference, combining “fast” and “slow” reasoning modes under a pretrained backbone (Li et al., 10 Jul 2025).

Structured Reasoning and Hallucination Control in Taxonomy Induction

CoL transforms taxonomy induction into a series of locally coherent, layerwise expansion tasks, achieving:

Containment of error propagation through ensemble-based hallucination filtering.
State-of-the-art precision and recall across diverse domains and limited-example regimes.
Potential for extension to self-supervised, adaptive, or hybrid architectures that combine discriminative and generative strengths (Zeng et al., 2024).

A plausible implication is that iterative, layerwise reasoning paired with lightweight verification can generalize to other hierarchical or structured prediction tasks.

6. Limitations and Future Directions

Scalability and Robustness

Both interpretations of CoLa exhibit scaling limits: in LLM adaptation, search cost and path length constraints can inhibit deployment on very large models. In taxonomy induction, performance degrades as the number of entities or hierarchy depth exceeds LLM context capacity or attention span. Domain specificity and the persistence of hallucination in specialized settings remain challenges (Zeng et al., 2024).

Future directions include:

Hierarchical self-supervision for variable layer sizes and dynamic sibling grouping in taxonomy induction.
Hybrid protocols incorporating fine-tuned relation scorers.
Adaptive CoLa editing strategies guided by additional meta-information or uncertainty.

7. Application Domains and Prospects

Dynamic depth adaptation via CoLa is relevant for efficiency-critical inference (search, reasoning, low-latency QA), robust deployment in error-prone environments, and resource-constrained contexts. In taxonomy induction, CoL applies to domain-specific ontology construction, knowledge base augmentation, and bootstrapping of symbolic resources in content organization. Both paradigms support interactive workflows where expert oversight or review complements automated CoLa proposals (Li et al., 10 Jul 2025, Zeng et al., 2024).

Markdown Report Issue Upgrade to Chat

References (2)

Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs (2025)

Chain-of-Layer: Iteratively Prompting Large Language Models for Taxonomy Induction from Limited Examples (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Chain-of-Layers (CoLa).