Chain-of-Layers (CoLa): Adaptive Architectures
- Chain-of-Layers (CoLa) is a framework that dynamically composes neural network layers and constructs taxonomies through adaptive, layerwise sequences.
- It employs methods such as skip/repeat operators and MCTS-based search to tailor model depth per input, enhancing both efficiency and accuracy.
- In taxonomy induction, CoLa builds hierarchies iteratively using in-context prompts and ensemble filtering, improving structural coherence and reducing hallucinations.
Chain-of-Layers (CoLa) refers to a family of frameworks that reformulate either neural network inference or taxonomy induction as the dynamic composition or growth of functions or structure in a “layer-wise” manner, driven by optimization or prompting. Two central but distinct lines of research dominate this term: (1) the test-time architectural adaptation of pretrained LLMs via flexible reordering, skipping, or recurrence of their internal layers (Li et al., 10 Jul 2025), and (2) the in-context, iterative construction of taxonomies through structured prompting and layerwise candidate expansion (Zeng et al., 2024). Each application leverages a novel interpretation of “chain-of-layers” to address core challenges in efficiency, adaptability, and structure. The following sections provide a comprehensive technical overview of both paradigms, their formal underpinnings, algorithms, empirical findings, and implications.
1. Formal Definitions and Mathematical Frameworks
Dynamic Layer Sequences in LLMs
Let a pretrained LLM consist of an ordered stack of layers, denoted , where each is a deterministic transformation. The standard forward pass is
A chain-of-layers (CoLa) generalizes this to any finite sequence drawn (with repetition) from :
Elementary “skip” and “repeat” operators alter by removing or repeating contiguous block(s) of layers, forming a combinatorial edit space. These operators formally model path pruning and local recurrence, allowing adaptation of depth and order per input.
Chain-of-Layer for Taxonomy Induction
In taxonomy induction, “chain-of-layer” (CoL) refers to a sequence of in-context expansions constructing a directed acyclic graph of “is-a” relations from a flat entity set . At each iteration 0, the model selects a new set of entities 1 forming the 2-th layer beneath current nodes, then attaches them as children to appropriate parents, expanding 3 layer by layer. The process iterates until all entities are placed, producing a progressively deepening hierarchy, not a fixed forward pass (Zeng et al., 2024).
2. Search, Optimization, and Iterative Protocols
MCTS-based CoLa Selection for LLMs
The CoLa search space is exponentially large due to combinatorial skip/repeat edits. CoLa employs Monte Carlo Tree Search (MCTS) to efficiently discover layer sequences that maximize a per-sample reward:
4
where 5 is path length, and 6 is 7 if 8 matches the gold answer, 9 otherwise. MCTS nodes represent partial or complete sequences, with transitions determined by skip/repeat actions. At each node, UCB-based selection guides exploration:
0
where 1 is cumulative reward, 2 the visit count, 3 total simulations, and 4 the exploration constant. Rollouts estimate the value of leaves, and rewards are backpropagated to guide search toward sequences on the accuracy-depth Pareto frontier (Li et al., 10 Jul 2025).
Iterative Prompting and Ensemble Filtering in CoL
CoL for taxonomy induction is realized via a staged, top-down expansion protocol:
- Layer-wise Candidate Selection (CoL-K): At each layer 5, the LLM receives the current taxonomy 6, remaining entities 7, and demonstration set 8, and is prompted to select appropriate children for each node in 9.
- Expansion: The selected entities 0 and their assignments form new candidate edges 1. Pruning is performed via an ensemble filter.
- Ensemble-based Ranking Filter: Masked LLM scoring (using SciBERT) assigns scores to candidate parent–child pairs via a template ensemble, ranking parent candidates by likelihood and retaining only the top-N (typically top-10) per child. Entities removed are returned to the pool for the next layer.
This iterative, filter-augmented process continues until entities are exhausted, ensuring both structural coherence and reduction of hallucinated content (Zeng et al., 2024).
3. Algorithmic Details and Pseudocode
MCTS for Test-Time Architecture Adaptation
Algorithmic steps for MCTS-CoLa include:
- Selection: Traverse tree from root by selecting children with the highest UCB.
- Expansion: Expand unvisited children via a random valid skip or repeat action, subject to a maximum path length.
- Simulation: For expanded leaves, roll out to a full valid sequence and compute 2.
- Backpropagation: Propagate rollout reward along the traversal path, updating 3 and 4 for each node.
- After 5 simulations, output Pareto-optimal 6 pairs.
The protocol is concretely instantiated with 7 for block manipulation and a tunable depth constraint (Li et al., 10 Jul 2025).
Chain-of-Layer Protocol for Taxonomy
The core loop can be formalized as follows:
- Initialize taxonomy 8 with seed 9; set 0.
- For each layer 1:
- Invoke CoL-K to select 2 and parent assignments using the in-context prompt.
- Update 3 and remove 4 from 5.
- Apply the EnsembleFilter via Equation (1):
6
where Sim is inversely proportional to parent ranking in the masked LM. - Retain only high-scoring edges; recycled entities proceed to the next iteration.
- Terminate when 7.
The process is elaborated in Algorithm 1 of (Zeng et al., 2024).
4. Empirical Results and Quantitative Analyses
Test-Time Depth Adaptation in LLMs
Across DART-Math (levels 1–5) and commonsense reasoning (ARC-Easy/Challenge), CoLa (MCTS-optimized) yields:
- For samples with originally correct predictions, over 75% admit a strictly shorter CoLa maintaining correctness (“C8C” transitions). Average total depth is reduced by 20–30%.
- For samples originally predicted incorrectly, over 60% are correctly resolved by some CoLa, with corrective CoLas often even shorter than C9C cases (suggesting skipping misleading layers is frequently beneficial).
- On LLaMA-3B base and ARC-Easy: original accuracy 27.8%, CoLa accuracy 95.8%; average relative depth is ∼50% (C→C) and ∼45% (W→C) of full forward path.
- Few cases are Pareto-optimal under the fixed model, indicating substantial headroom for adaptive architectures (Li et al., 10 Jul 2025).
Taxonomy Induction Benchmarks
On WordNet, Wikipedia, DBLP, and SemEval-Sci benchmarks:
- CoL (GPT-4, 5-shot) achieves Edge-F1/A-F1 of 57.73%/79.62% (WordNet), 96.43%/— (Wiki), 47.96%/— (DBLP), and 51.59%/— (SemEval-Sci), outperforming both supervised (Graph2Taxo, CTP) and prior prompting-based (TaxonomyGPT) baselines.
- Performance drops as taxonomy size exceeds ~80 entities, exposing LLM context and reasoning limitations.
- Ablations confirm necessity of both iterative CoL-K logic (for recall) and ensemble filtering (for precision), with the combination achieving optimal overall F1 (Zeng et al., 2024).
5. Theoretical and Practical Implications
Dynamic Inference and Generalization in LLMs
Fixed-depth forward passes are almost never optimal; CoLa reveals a latent space of input-dependent architectures:
- Shallow “fast thinking” CoLas accelerate inference on easy tasks by up to 30%.
- Deep, recurrent “slow thinking” paths recover correct predictions for hard or noisy inputs, boosting robustness without finetuning.
- The Transformer’s architectural modularity enables layers to act as reusable, composable modules, introducing a new axis of adaptation orthogonal to weight pruning or early-exit strategies.
This suggests a route toward unified dynamic inference, combining “fast” and “slow” reasoning modes under a pretrained backbone (Li et al., 10 Jul 2025).
Structured Reasoning and Hallucination Control in Taxonomy Induction
CoL transforms taxonomy induction into a series of locally coherent, layerwise expansion tasks, achieving:
- Containment of error propagation through ensemble-based hallucination filtering.
- State-of-the-art precision and recall across diverse domains and limited-example regimes.
- Potential for extension to self-supervised, adaptive, or hybrid architectures that combine discriminative and generative strengths (Zeng et al., 2024).
A plausible implication is that iterative, layerwise reasoning paired with lightweight verification can generalize to other hierarchical or structured prediction tasks.
6. Limitations and Future Directions
Scalability and Robustness
Both interpretations of CoLa exhibit scaling limits: in LLM adaptation, search cost and path length constraints can inhibit deployment on very large models. In taxonomy induction, performance degrades as the number of entities or hierarchy depth exceeds LLM context capacity or attention span. Domain specificity and the persistence of hallucination in specialized settings remain challenges (Zeng et al., 2024).
Future directions include:
- Hierarchical self-supervision for variable layer sizes and dynamic sibling grouping in taxonomy induction.
- Hybrid protocols incorporating fine-tuned relation scorers.
- Adaptive CoLa editing strategies guided by additional meta-information or uncertainty.
7. Application Domains and Prospects
Dynamic depth adaptation via CoLa is relevant for efficiency-critical inference (search, reasoning, low-latency QA), robust deployment in error-prone environments, and resource-constrained contexts. In taxonomy induction, CoL applies to domain-specific ontology construction, knowledge base augmentation, and bootstrapping of symbolic resources in content organization. Both paradigms support interactive workflows where expert oversight or review complements automated CoLa proposals (Li et al., 10 Jul 2025, Zeng et al., 2024).