Target-Oriented Pretraining Data Selection via Neuron-Activated Graph

Published 17 Apr 2026 in cs.CL | (2604.15706v1)

Abstract: Everyday tasks come with a target, and pretraining models around this target is what turns them into experts. In this paper, we study target-oriented LLM (LM) pretraining by introducing Neuron-Activated Graph Ranking (NAG-based Ranking), a training-free and interpretable framework for target pretraining data selection. Rather than using black-box representations, our approach directly characterizes each target input by a sparse set of high-impact neurons in any off-the-shelf LLMs. Concretely, we quantify neuron impact and select the most influential neurons across layers into a compact Neuron-Activated Graph (NAG), and rank candidate data by NAG similarity to target examples. We conduct experiments across six benchmarks, where our NAG-based Ranking improves target-oriented pretraining by 4.9% on average over random sampling, and also outperforms state-of-the-art baselines by 5.3% accuracy on HellaSwag. It also remains effective under a more applicable multi-target setting, where our best setup surpasses two baselines by 1.1% and 4.1%, respectively. Furthermore, we provide a comprehensive analysis on why and how our NAG works, e.g., deactivating NAG-selected neurons (only 0.12% of all) causes a 23.5% performance collapse, and restricting NAG to the final layer incurs a 4.1% average drop, indicating that NAG captures a sparse "functional backbone" for learning target features. We release the code at https://github.com/asillycat/NAG.

Abstract PDF Upgrade to Chat

Authors (10)

Summary

The paper presents a neuron-centric pretraining data selection method that leverages sparse neuron activation profiles to directly guide data curation.
It employs per-input neuron impact quantification across layers to construct graphs that match candidate data with target task profiles.
Empirical results show up to a 5.3% accuracy improvement and enhanced efficiency across diverse benchmarks.

Neuron-Activated Graphs for Target-Oriented Pretraining Data Selection

Motivation and Problem Setting

The effectiveness of LLM pretraining is fundamentally influenced by the alignment between the pretraining data distribution and a targeted downstream domain or task. Conventional selection paradigms, predominantly based on global data ``quality'' heuristics or black-box embedding similarity, fail to provide interpretable, highly task-aligned data curation. These approaches are limited by their reliance on shallow metrics or features that often disregard the intricate distribution of internal model dynamics central to the acquisition of domain- or task-specific capabilities.

This work proposes a neuron-centric paradigm for pretraining data selection, Neuron-Activated Graph (NAG), which leverages per-input sparse neuron-activation profiles as the principal selection feature. NAG forgoes black-box distillation in favor of model-based interpretability: selection is driven by quantifiable, localized neuron impact on inference computations for a given target set and candidate data.

As the authors motivate, such a framework directly aligns the selection process with the functional backbone'' of neural computation required to support a task, addressing a known gap between heuristicquality'' and actual task induction.

Figure 1: General-purpose data selection (left) and prior target selection (middle left) fail to align model-layer specificity to downstream needs. NAG selection (middle right) admits neuron-level task alignment, generalizing across distinct domains.

Methodology

Neuron Impact Quantification

NAG begins by measuring neuron impact for each input. For a given neuron (column of a projection matrix in Transformer modules, both in FFN and attention sublayers), the impact of deactivation is defined as the $\ell_2$ -norm difference in the layer's output with and without the neuron. The formulation is lightweight, eschewing expensive final-output loss proxies in favor of a validated local surrogate. By extracting maximal impact across the sequence and aggregating across all layers, the method identifies a sparse subset of high-impact “activated” neurons for each instance.

Construction and Use of Neuron-Activated Graphs

These per-instance high-impact neurons are formalized as the NAG: for each input, each layer contributes the indices of $K$ top-impact neurons, forming a layered, sparse graph. Data selection then proceeds by:

Aggregating neuron activation statistics for a small target set (target NAG profile).
Ranking candidate source samples by the similarity of their induced NAG to the target NAG profile (using set overlap metrics).
Figure 2: Pipeline for NAG-based target data selection: target and candidate samples are both projected to their NAGs, and candidates are filtered by their NAG similarity to the aggregated target profile.

Unlike classifier-based or embedding-proxy approaches, NAG is training-free (when using an off-the-shelf LLM) and interpretable. The selection can be adapted to both single-task and multi-task target settings. Integration with existing classifier-based quality signals (e.g., FineWeb-Edu) is also feasible by rank aggregation.

Empirical Results

Main Findings

Across six challenging benchmarks spanning reasoning, factual QA, and commonsense domains, the NAG selection method provides robust benefits:

Main accuracy improvements: NAG-based selection yields a strong mean improvement of 4.9% absolute accuracy over random data curation and outperforms advanced quality-based and task-matching baselines by 1–5% (notably +5.3% on HellaSwag).
Multi-target robustness: In a scenario where multiple benchmarks are used as simultaneous targets, performance with direct mixture remains strong (e.g., +3.1% over random, +0.6% over FineWeb-Edu), whereas other target-oriented selectors often degrade significantly.
Model-agnosticism: Comparable gains are observed when constructing NAG with Qwen3-1.7B-Base, Llama-3.2-3B, or SmolLM3-3B (accuracy gains persist in the 4.7–5.0% range).

The additive utility of NAG when combined with quality classifiers further demonstrates that task-discriminative signals captured by NAG are orthogonal to more global data quality proxies.

Compute and Efficiency

NAG-based selection yields compute multipliers of 1.27–2.42 $\times$ versus baselines—the compute required to reach a fixed accuracy is reduced proportionally. Extraction costs are amortized, as only a single forward pass per document is required for NAG feature computation, with reuse across arbitrary downstream targets.

Figure 3: Compute multipliers (i.e., training efficiency improvements) of NAG selection relative to baselines across several benchmarks, with CM $>1$ indicating greater efficiency.

Analysis: Mechanisms and Interpretability

Task Decomposition and Neuron Importance

The ablation study finds that deactivating as little as 0.12% (20 neurons per layer) of the model selected by NAG induces a catastrophic 23.5% mean accuracy drop, while random neuron deactivation is negligible. This supports the hypothesis that NAG isolates the functional backbone actually responsible for the targeted capability.

Figure 4: Clusters formed from NAG features strongly correspond to ground-truth task divisions, demonstrating that NAG encodes task-discriminative information.

Task Discriminativity and Information Localization

The ability of NAG features to cluster data by task (clear separation, reflecting task-family relationships) illustrates that the selected neurons systematically encode task-relevant computation; see t-SNE projections and clustering metrics.
Downstream selection utility is monotonic in NAG-target similarity: aggressive filtering using NAG similarity continues to improve target performance, peaking with a small top- $r_f$ fraction (Figure 5).
Figure 5: Downstream performance as function of filtering threshold $r_f$ ; NAG selects high-utility data robustly where alternatives degrade.

Design Choices: Neuron Type, Layer, Sparsity

FFN up-projection ("up_proj") neurons provide maximal target-oriented signal; other neuron types (e.g., attention K-proj) are weaker (Figure 6).
NAG efficacy is strictly tied to a multi-layer representation; restricting to the final layer yields a sharp mean degradation of 4.1%, frequently dropping below random selection (as shown analytically and empirically in the results).
Layer-wise neuron selection sparsity is critical. Peak accuracy is obtained at $r_k=0.3\%$ neurons per layer; moving toward dense selection destroys the signal's utility (Figure 7).
Figure 7: NAG performance as a function of neuron sparsity ratio; optimal data selection relies on extremely sparse, high-impact neuron subsets.

Theoretical and Practical Implications

This work demonstrates that neuron-level structural features—well beyond surface text similarity or task-agnostic ``quality'' heuristics—yield highly selective, efficient, and interpretable data curation for task-aligned pretraining. By characterizing data through the lens of sparse, layer-distributed computational pathways, one directly aligns the data selection mechanism with the underlying representational and procedural substrate for the target task.

The practical implication is a robust, training-free selection process transferrable across model scales and architectures (including smaller extraction models), compatible with real-world multi-target settings, and composable with existing pipelines. Theoretical insights flow toward a more granular understanding of which precise model subsystems (neurons) are required for task generalization, and how these can be targeted through pretraining data.

Methodological advances may extend toward even finer-grained, sparsity-controlled pathway manipulation (neurons, mechanisms, even attention heads). Applications beyond data selection—e.g., model editing, modularity analysis, targeted domain adaptation—are well-motivated by these findings.

Conclusion

The Neuron-Activated Graph framework establishes a new standard for target-oriented pretraining data selection, grounded in sparse, interpretable neuron activity. It closes the loop between the selection procedure and the emergent circuit structure underlying task competence, outperforming both heuristic and learned black-box alternatives. Future research will refine NAG pathway modeling, explore scaling to even larger LLMs and diverse corpus compositions, and integrate more advanced algorithmic strategies for multi-target mixture. The direct use of model-internal computational structure for steering data curation opens a promising direction for both practical model development and the scientific study of LLM internal representations.