BSharedRAG: Unified Retrieval & Generation
- The paper introduces BSharedRAG, a framework that uses a shared pre-trained backbone with task-specific LoRA adapters to integrate retrieval and generation.
- It employs continual domain-specific pre-training and hard negative mining, achieving significant gains (e.g., Hit@3 up to 68.4% and BLEU-3 of 12.63) over traditional models.
- The approach enhances parameter efficiency and robustness for dynamic fields such as e-commerce, with potential to extend to domains like medicine and law.
Backbone Shared Retrieval-Augmented Generation (BSharedRAG) formalizes an architecture in which both retrieval and generation modules employ a single, continually pre-trained backbone LLM, with distinct task-specialized adapters. BSharedRAG improves domain adaptation, resource efficiency, and grounding fidelity versus architectures that maintain separate retriever/generator backbones. The paradigm is motivated by the need for up-to-date, factual, and context-aware knowledge integration—especially in domains such as e-commerce and medicine—while avoiding the negative transfer and tuning complexity of fully joint multitask training.
1. Motivations and Design Principles
Traditional Retrieval-Augmented Generation (RAG) systems deploy separate modules for retrieval (often dense bi-encoders or dual-encoders) and generation (decoder-only LLMs), optimized exclusively for their respective tasks. This modularity precludes mutual knowledge transfer: retrievers and generators encode language differently, making relevance scores misaligned with generator utility. Joint multitask training, wherein both heads share all model weights, introduces negative transfer and the practical difficulty of balancing losses, as retrieval and generation gradients may conflict or interfere, especially in domains with sparse or highly heterogeneous data (Guan et al., 30 Sep 2024).
BSharedRAG introduces a compromise: a shared backbone LLM (often continually pre-trained on domain-specific corpora), with lightweight, plug-and-play Low-Rank Adaptation (LoRA) modules—separately optimized for retrieval and generation objectives—attached to the backbone. The backbone itself is frozen during downstream training, with only LoRA adapters tuned for each task. This arrangement preserves efficient parameter sharing, enables transfer of domain representations, and allows task specialization through adapters.
2. Architecture and Optimization Targets
The BSharedRAG pipeline comprises three primary stages:
- Backbone Pre-training: The LLM backbone θₛ undergoes continual pre-training on mixed domain data (e.g., e-commerce product reviews, medical records). Objective: maximize next-token prediction over all corpus tokens:
- Adapter Fine-tuning:
- Retriever Adapter (θ_{rₛ}): LoRA modules are attached to the backbone for dense retrieval. Embeddings are extracted via EOS token representations (RepLLaMA approach). The retrieval objective uses the InfoNCE contrastive loss, with hard negatives mined from leading baseline retrievers:
- Generator Adapter (θ_{gₛ}): LoRA modules tuned for instruction generation over (query, retrieved doc, answer) triples, minimizing language modeling loss:
The backbone θₛ is shared and frozen during both adapter optimizations.
- Inference and Modular Deployment: At test time, the retriever and generator use their respective adapters above the shared backbone—enabling parameter efficiency and maximal transfer of domain knowledge.
This architecture avoids the tuning instability and parameter explosion of fully separate retriever/generator models, and the negative transfer of shared multitask losses (see Fig. 1 of (Guan et al., 30 Sep 2024)).
3. Experimental Benchmarking and Results
BSharedRAG was validated on large-scale Chinese e-commerce corpora (WorthBuying, 735K+ documents) and public datasets (CPR-Ecom). Training used GPT-4 curated QDA tuples (question, document, answer), extensive retrieval benchmarks (C-MTEB), and human/expert-annotated test splits.
Retrieval Performance:
- On CPR-Ecom, Hit@3 = 68.4% (BSharedRAG retriever) vs. 60.35% for the baseline BGE-large-zh+hard negatives; +13% absolute improvement.
- On WorthBuying, Hit@3 = 58.6% vs. 55.7%; +5% absolute.
Generation Performance:
- On WorthBuying, BLEU-3 = 12.63 (+23% vs. best baseline); competitive or superior ROUGE-L, BERTScore, GPT-4 accuracy.
- Fully shared RAG architectures (joint optimization) yielded much lower scores (BLEU-3: 2.35).
Analysis:
- Ablations confirmed that continual pre-training (CPT) and hard negative mining are necessary; both significantly hurt performance if omitted.
- Improved retriever alignment is consistently reflected in higher generation fidelity—the preferences encoded in document representations better match generator input requirements.
Table: Quantitative comparison (selected results from (Guan et al., 30 Sep 2024)):
| Model | Hit@3 (CPR-Ecom) | BLEU-3 (WorthBuying) |
|---|---|---|
| BSharedRAG (retrieval) | 68.4% | - |
| BGE-large-zh+HN (baseline) | 60.35% | - |
| BSharedRAG (generation) | - | 12.63 |
| Baichuan2-7b-chat+RAG-IT | - | 10.24 |
| Fully shared RAG | - | 2.35 |
4. Efficiency, Parameterization, and Scaling
BSharedRAG achieves efficiency by:
- Avoiding double backbone parameterization—retriever and generator share the large, domain-adapted LLM.
- Limiting task adaptation to small matrices via LoRA, yielding strong performance improvements at minimal computational cost.
- Facilitating scaling: inference time and GPU/memory requirements remain close to that of a single backbone plus modest adapters, suitable for large-scale, high-throughput deployments.
Architecture diagrams (cf. Fig. 2 (Guan et al., 30 Sep 2024)) clarify flow: backbone pre-training, retrieval adapter training, generation adapter training, and modular inference.
5. Domain Adaptation, Robustness, and Generalizability
In e-commerce, BSharedRAG demonstrates robustness for long-tail entity queries and fast-changing product descriptions: continual pre-training and adapter modularity enable timely integration of newly available domain knowledge. The training corpus (WorthBuying) contains professional product reviews, leveraging GPT-4 for variant generation and human validation for evaluation sets.
Beyond e-commerce, the authors suggest that the architecture generalizes to other domains (medicine, law) with high entity churn and demand for factual accuracy (Sharma, 28 May 2025). The backbone sharing approach is orthogonal to advancements in retrieval reranking, multi-hop fusion, and semantic bridging (e.g., via R²AG (Ye et al., 19 Jun 2024)), and can be combined with such techniques to further improve performance.
6. Comparison to Related Frameworks and Variants
Relative to separate RAG architectures and fully shared multitask RAG, BSharedRAG offers:
- Improved domain transfer: continual backbone pre-training aligns both retrieval and generation representations without sacrificing specialization.
- Avoidance of negative transfer: adapters tuned separately, backbone remains frozen during fine-tuning.
- Resource efficiency: one backbone, two lightweight adapters.
- Strong performance on both retrieval and generation metrics, outperforming strong baseline retrievers/generators, even when domain-adapted or using fusion (Guan et al., 30 Sep 2024).
Table: Key architectural contrasts (as per (Guan et al., 30 Sep 2024)):
| Approach | Backbones | Optimization | Pros | Cons |
|---|---|---|---|---|
| Separate RAG | 2 | Separate task losses | Max modularity | No knowledge transfer |
| Fully shared RAG | 1 | Joint loss (λ-tuned) | High coordination | Negative transfer, tunability issues |
| BSharedRAG | 1 | Per-task LoRA adapters | Balanced transfer/specialization | Relies on backbone quality |
A plausible implication is that BSharedRAG will become a preferred template for domain-specific RAG systems requiring both factuality and operational efficiency.
7. Limitations and Future Directions
- Current BSharedRAG implementations are monolingual (Chinese) and domain-constrained; experiments on other languages and settings remain to be conducted.
- Only LoRA-based adapters are evaluated—extensions to other lightweight adaptation strategies, mixture-of-experts, or joint-feature modules may be worthwhile.
- Future work will test integration with query rewriting, advanced reranking, chain-of-thought augmentation, and enhanced factuality constraints to further improve grounding and robustness.
The authors note potential for generalization, but these claims should be empirically validated. Dataset, code, and models are released for public research at https://bsharedrag.github.io.
In summary, Backbone Shared Retrieval-Augmented Generation is characterized by a continually pre-trained domain-specific LLM, frozen during fine-tuning, with small, plug-and-play adapters for each of retrieval and generation tasks. This architectural arrangement yields strong performance improvements and parameter efficiency, enabling practical deployment in complex, dynamic domains such as e-commerce, with demonstrated results in retrieval accuracy and generation fidelity.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free