OpenDecoder: Quality-Aware & PT LLM
- OpenDecoder is a technique that incorporates explicit document quality scores into the decoder's attention mechanism to improve retrieval-augmented generation robustness.
- Gervásio PT is a fully open, instruction-tuned Portuguese decoder-only Transformer model based on LLaMA 2 7B, enabling advanced language research and commercial applications.
- Both approaches emphasize open access, reproducibility, and adaptability to diverse language tasks, offering significant performance gains and methodological transparency.
OpenDecoder refers to two distinct, influential developments in open LLM research. The first is "OpenDecoder: Open LLM Decoding to Incorporate Document Quality in RAG," a decoding algorithm that integrates explicit document quality signals into retrieval-augmented generation pipelines to enhance robustness and answer fidelity (Mo et al., 13 Jan 2026). The second, exemplified by "Gervásio PT," is a fully open, instruction-tuned, decoder-only Transformer model for Portuguese, designed to enable broad research and commercial use (Santos et al., 2024). The following sections profile the architectural innovations, methodologies, evaluation paradigms, licensing, and broader impact of these projects.
1. Architectural Innovations
OpenDecoder for Quality-aware RAG
OpenDecoder modifies standard decoder-only Transformers used in retrieval-augmented generation (RAG) by introducing external document evaluation features as gating mechanisms within the attention computation. The canonical pipeline consists of:
- Retriever extracts top- documents for a query .
- Each receives explicit external quality scores: a retriever similarity score (), an LLM-based ranking score (), and a query performance prediction (QPP) score ().
- These scores are normalized and assembled into a matrix mapping to the positions of document tokens.
- The attention mechanism of the decoder LLM is modified to include as a multiplicative mask on the attention logits, altering the token generation probability distribution.
Attention computation becomes:
where denotes element-wise multiplication (Mo et al., 13 Jan 2026).
Gervásio PT: Open Decoder for Portuguese
Gervásio PT is a 7B-parameter, decoder-only Transformer based strictly on LLaMA 2 7B, targeting both European (PTPT) and Brazilian (PTBR) Portuguese. Key architectural features:
- 32 transformer layers; 32 attention heads; hidden size ; .
- 32k subword vocabulary.
- All modifications are software-level (continued causal LM training); architectural form and dimensions match LLaMA 2 7B (Santos et al., 2024).
No architectural changes occur beyond language- and task-specific fine-tuning.
2. Methodological Foundations and Quality Feature Engineering
Explicit Indicator Features (OpenDecoder)
For each retrieved document, OpenDecoder computes:
- : Normalized retriever dot-product similarity.
- : LLM-based relevance logits for .
- : Logit output from a QPP model.
- Combined via weighted aggregation: .
- Score normalization (max or min-max) prior to attention gating, with query/instruction tokens set to 1.
These features serve as scalar weights attached to document tokens, guiding the attention mechanism to prioritize high-quality context (Mo et al., 13 Jan 2026).
Instructional Data Sourcing (Gervásio PT)
Gervásio PT uses instruction-tuned data based on:
- Manual translation and augmentation of GLUE (MRPC, RTE, STS-B, WNLI) and SuperGLUE (BoolQ, CB, COPA, MultiRC) tasks into PTBR and PTPT Portuguese.
- Templates embed zero-shot/few-shot demonstrations.
- Large-scale augmentation, e.g., answer→question generation, expansion to over 160,000 examples and 68M tokens per Portuguese variant.
- Training is performed using Hugging Face Transformers, Accelerate, FlashAttention, DeepSpeed, and SentencePiece BPE encoding (Santos et al., 2024).
The training employs two epochs with a learning rate of and batch size 256 (with gradient accumulation).
3. Training Paradigms and Post-training Strategies
OpenDecoder Post-training Integration
OpenDecoder introduces an additional attention-gating parameter set (), trained via standard next-token log-likelihood while using the modified attention. No new regularizers are introduced. After training, can be applied to any compatible LLM as a post-training plugin. Robustness to noisy retrieval is explicitly instilled through exposure to synthetic low-quality document perturbations during training (Mo et al., 13 Jan 2026).
Gervásio PT Scaling and Resource Management
Gervásio PT training is performed on 16 × A100 40GB GPUs but is designed to allow inference and (sharded) fine-tuning on consumer GPUs with ≥24GB VRAM. All necessary data, scripts, and preprocessed resources are released to enable full reproducibility. The model can generate at ~12 tokens per second on a single GPU using standard Hugging Face inference drivers (Santos et al., 2024).
4. Performance Evaluation and Empirical Findings
Robustness and Quality Gains (OpenDecoder)
OpenDecoder is empirically evaluated on five QA tasks (NaturalQuestions, TriviaQA, PopQA, HotpotQA, 2WikiMultiHopQA) under three noise regimes: Normal, Noisy (partial/irrelevant docs), and Extreme (all irrelevant). Major results:
| Setting | Method | F1 Score | EM Score |
|---|---|---|---|
| Normal | RbFT | 34.22 | 31.34 |
| Normal | OpenDecoder | 34.87 | 32.02 |
| Noisy | RbFT | 32.14 | 28.72 |
| Noisy | OpenDecoder | 34.16 | 30.81 |
| Extreme | RbFT | 25.53 | 21.90 |
| Extreme | OpenDecoder | 27.69 | 23.91 |
OpenDecoder demonstrates statistically significant improvements over state-of-the-art baselines. Guidance by retriever scores alone provides substantial gains, with further robustness and multi-hop performance realized by aggregating all three indicator features (Mo et al., 13 Jan 2026).
Portuguese LLM Benchmarks (Gervásio PT)
Zero-shot and few-shot results on translated GLUE/SuperGLUE and native PTBR tasks:
| Model | MRPC | RTE | COPA | ENEM 2022 | BLUEX | RTE | STS |
|---|---|---|---|---|---|---|---|
| Gervásio PTBR | 0.78 | 0.83 | 0.21 | 0.20 | 0.26 | 0.75 | 0.21 |
| LLaMA 2 7B | 0.04 | 0.05 | 0.49 | 0.25 | 0.29 | 0.09 | 0.10 |
| LLaMA 2 Chat | 0.54 | 0.38 | 0.55 | 0.22 | 0.30 | 0.55 | 0.18 |
| Sabiá-7B | — | — | — | 0.60 | 0.77 | 0.65 | 0.14 |
Gervásio PT models robustly exceed LLaMA 2 7B and match or surpass Sabiá-7B in sentence-level QA and similarity, while providing fully open licensing and reproducibility (Santos et al., 2024).
5. Licensing, Accessibility, and Deployment
Both lines of OpenDecoder work prioritize open access:
- OpenDecoder for RAG: The methodology is model-agnostic and released as a post-training modification, designed for community adoption and extension (Mo et al., 13 Jan 2026).
- Gervásio PT: Distributed under the MIT license, permitting unrestricted research and commercial use with no registration requirements. All checkpoints, translated datasets, and scripts are hosted at https://huggingface.co/PORTULAN (Santos et al., 2024).
Deployment is seamless via Hugging Face’s python interfaces, and operational on modest hardware.
6. Limitations and Prospective Directions
For OpenDecoder in RAG:
- Score normalization and weighting remain ad-hoc; adaptive or learnable fusion could enhance interpretability and effectiveness.
- Evaluated only for QA; extension to summarization or code-generation is open for future research.
- Inference cost rises with extra score injection at each transformer layer ( per step).
For Gervásio PT:
- Coverage on multi-way QA (e.g., ENEM, BLUEX) is limited by the scope of instructional data.
- Future work includes creating additional model variants (scaling up or down), expanding PT-specific corpora, refining alignment through reinforcement learning from human feedback, and exploring byte-level tokenization to reduce misalignment issues.
These efforts collectively chart the path for increasingly robust, fair, and open-access foundation models for both global and underrepresented languages.
7. Impact and Research Significance
OpenDecoder for RAG establishes a new standard for incorporating external evidence quality into LLM decoding, substantially improving robustness in noisy retrieval settings and presenting a universally applicable, post-training attention modification paradigm (Mo et al., 13 Jan 2026). Gervásio PT embodies the principle of open, replicable, and accessible LLMs for the Portuguese language, acting as a benchmark and research enabler for language technology in both academic and industrial domains (Santos et al., 2024). Together, these threads demonstrate the convergence of architectural transparency, domain adaptation, and openness as central themes in state-of-the-art LLM advancement.