GRASP LoRA: Guided Adapter Sparsity via GRPO
- The paper proposes GRASP LoRA, a method that leverages GRPO for dynamically optimizing adapter sparsity, achieving up to 7× faster fine-tuning on cross-lingual tasks.
- GRASP LoRA is a parameter-efficient approach that merges English and target language LoRA adapters and employs magnitude-based pruning with learnable global prune ratios.
- Experimental results demonstrate improved metrics in summarization and QA tasks while significantly reducing computational costs, highlighting its practical efficiency.
GRASP LoRA (GRPO Guided Adapter Sparsity Policy) is a parameter-efficient fine-tuning methodology designed for cross-lingual transfer of LLMs under limited computational and data resources. Unlike conventional adapter pruning pipelines that rely on grid search over sparsity ratios—an approach both resource-intensive and coarse—GRASP LoRA transforms the global sparsity ratio into a learnable control variable optimized online by a Group-Relative Policy Optimization (GRPO) controller using minimal development data (Hassan et al., 10 Jan 2026).
1. GRPO Controller: Mathematical Formulation and Optimization
At the core of GRASP LoRA is a stochastic policy for the global prune ratio , parameterized by a univariate Gaussian: Every optimizer steps, the controller samples candidate prune ratios as and . Magnitude-thresholded binary masks are constructed over the merged LoRA weights , each inducing a pruned subnetwork. Candidate losses and a baseline loss are evaluated on a fixed micro development slice of target-language examples: serves as the reward signal. The policy is optimized via the GRPO surrogate: where regularizes towards the current ratio and encourages entropy. Score-function gradients , are computed according to centered advantages, and parameter updates are constrained within admissible bounds. Commitment to a new prune ratio occurs only if no micro-dev loss increase is observed, with bounded step size .
2. End-to-End Algorithmic Workflow
GRASP LoRA interleaves adapter fine-tuning, policy-guided pruning, and evaluation in three main phases:
- Adapter Training and Merging
- English LoRA adapters are trained on high-resource English data with the backbone model frozen.
- Target-language LoRA adapters are trained on low-resource target language data, again with a frozen backbone.
- The adapters are merged by summing their low-rank update matrices at each projection site: .
- Sparsity Policy Learning (Controller Rounds)
- Initialize , with controller parameters set to .
- In each controller round:
- Fine-tune the merged adapters on target data under the current mask .
- Every steps, probe candidate prune ratios, evaluate their corresponding micro-dev loss, and update the controller policy using Eqs. (1)-(2).
- Commit to a new if improvement is observed, update masks, and clear optimizer states for new zeros.
- Upon completion, select via post-hoc validation loss minimization.
- Final Pruning and Fine-tuning
- Reload the frozen backbone and pre-controller merged adapters.
- Build the final mask and fine-tune the masked model on the full target data until early stopping on a held-out dev set.
3. Adapter Merging and Magnitude-based Pruning
LoRA adapter merging is performed by summing the low-rank update matrices for source and target languages: Pruning is implemented tensor-wise: for prune ratio , the mask is determined by retaining the top fraction of (magnitude-ordered) entries per tensor. Let be the number of parameters in tensor , , and the th order statistic of . Then,
This mask is applied to all adapters on the (frozen) backbone model.
4. Experimental Protocol and Hyperparameters
The evaluation covers cross-lingual transfer for summarization and extractive QA:
- Datasets:
- XL-Sum (English→Arabic, English→Chinese): English train/dev 10k/1k; Arabic/Chinese train 50, dev 50, micro 16, test 100.
- MLQA (extractive QA): English train 3k; Arabic/Chinese train 50, micro 16, test 100.
- Model and PEFT Setup:
- Backbone: Llama 3 8B (frozen).
- LoRA applied to Q and V projections; rank 8, , dropout 0.05.
- Optimization and Controller:
- Adapter fine-tuning: 10 epochs, lr , AdamW, batch size 1, max input 2200 tokens.
- GRPO settings: prune range , , probe interval , candidates, micro-dev , , controller lr $0.05$.
- Regularization is tuned: for Arabic XL-Sum , Chinese XL-Sum , Arabic MLQA , Chinese MLQA .
- Evaluation Metrics:
- Summarization: BERTScore-F1, BLEU-4, ROUGE-L (and additional variants).
- QA: BERTScore-F1, Exact Match, token F1 (plus BLEU/ROUGE/chrF for spans).
- Prompt Structure: Unchanged across languages; e.g., for summarization, "Article:{article} → Summary:"; for QA, "Context:{context} Question:{question} → Answer:".
5. Empirical Results and Performance Analysis
GRASP LoRA demonstrates consistent improvements over strong merge-and-prune grid search baselines:
| Task | Baseline Prune | GRASP Prune | BERT-F1 Δ | BLEU-4 Δ | ROUGE-L Δ | EM Δ | F1 Δ | Time Δ |
|---|---|---|---|---|---|---|---|---|
| XL-Sum Arabic | 70% | 67.49% | +0.88 | +1.75 | +2.13 | — | — | 3.90× faster |
| XL-Sum Chinese | 50% | 56.94% | +1.62 | +1.73 | +1.45 | — | — | 5.66× faster |
| MLQA Arabic | 40% | 48.97% | +0.56 | — | — | +2.67 | +2.22 | 6.40× faster |
| MLQA Chinese | 10% | 23.73% | +1.98 | — | — | +1.50 | +0.67 | 7.45× faster |
Additional findings include:
- The approach reduces end-to-end runtime by a factor of 4–7× compared to grid search baselines.
- Improvements are robust with respect to micro-dev size, with and BERT-F1 stable across .
- Regularization ablations show that removing entropy or mean anchoring leads to excessive pruning (~79%) and a 1–2 point drop in evaluation metrics.
- Qualitative analysis shows superior semantic faithfulness in summarization and more accurate answer extraction in QA compared to baselines.
6. Implementation Considerations and Practical Implications
Key features for faithful and efficient deployment include:
- Use of a small, fixed micro-dev slice (16 examples) for all controller evaluations, independent of the early-stop dev set.
- All controller reward computations, pruning evaluations, and commitment decisions are logged, enabling post-hoc selection if needed.
- Identical prompt templates and consistent adapter architectures facilitate experimentation across unrelated linguistic domains.
- The learnable sparsity policy makes it feasible to select fractional sparsity optima impractical under conventional discrete grid search, especially in low-resource settings.
A plausible implication is that GRASP LoRA extends reliable adapter reuse to previously intractable low-resource regimes by decoupling sparsity hyperparameter tuning from costly grid search. The method offers a systematic pathway for tuning adapter sparsity using only minimal dev resources, providing improved model quality, content coverage, and answer quality relative to strong baselines and yielding major reductions in both computational and annotation costs (Hassan et al., 10 Jan 2026).