Amadeus-Verbo Qwen2.5-7B for PT-BR NLP
- Amadeus-Verbo Qwen2.5-7B is a large-scale, 7B-parameter autoregressive language model fine-tuned specifically for Brazilian Portuguese using transfer learning and targeted instruction.
- It employs a transformer decoder with 32 layers, 4,096-dimensional embeddings, and Rotary Position Embeddings to deliver competitive performance across tasks like hate-speech detection and legal reasoning.
- The model is publicly available on HuggingFace, supporting diverse applications including chatbots, content moderation, and document understanding in educational, legal, and governmental domains.
Amadeus-Verbo Qwen2.5-7B is a LLM specifically trained for Brazilian Portuguese, built upon the technical foundations of the Qwen2.5-7B architecture and released as part of the Amadeus-Verbo family of models. Developed to democratize and accelerate open-source Brazilian Portuguese NLP research, this model exploits transfer learning and targeted instruction fine-tuning to deliver competitive performance across a broad set of academic, industrial, and governmental benchmarks. Amadeus-Verbo Qwen2.5-7B features a transformer decoder backbone with approximately seven billion parameters, tuned for both general and instruction-following capacities in PT-BR. The model is publicly available for research and development via HuggingFace (Cruz-Castañeda et al., 20 May 2025).
1. Model Architecture and Scaling
Amadeus-Verbo Qwen2.5-7B employs an autoregressive transformer-decoder topology consistent with Qwen2.5-7B. The backbone comprises 32 layers, a model (hidden) dimension of 4,096, a feed-forward inner dimension of 16,384, and 32 attention heads. Each attention head has a dimension of . The architecture incorporates Rotary Position Embeddings (RoPE) for positional encoding and enables a maximum context window of 8,192 tokens during fine-tuning.
Parameter count in this configuration is computed via:
with approximate scaling law:
where , , yielding parameters. Vocabulary size is tokens (SentencePiece), with byte-pair merges optimized for UTF-8 encoded text.
Weights and activations are stored in bfloat16 format during fine-tuning. The 8K context window imposes memory complexity for attention states.
2. Pretraining and Portuguese-Specific Fine-Tuning
2.1 Pretraining
Amadeus-Verbo Qwen2.5-7B builds upon the Qwen2.5 base model, initially pretrained on approximately two trillion tokens spanning multilingual web-crawl, code repositories, conversational datasets, scientific literature, news, and social media (Qwen et al., 2024). Tokenization leverages SentencePiece with a k-vocabulary. The pre-training objective is next-token prediction via standard autoregressive cross-entropy.
2.2 Supervised Fine-Tuning in Brazilian Portuguese
Fine-tuning on PT-BR consists of approximately 600,000 unique instruction–response pairs covering educational prompts, classification, QA, summarization, code explanation, and other knowledge domains. Data was filtered for quality, with explicit removal of profanity and hate speech, formatted into a four-piece prompt template per (instruction, input, output, text). The final SFT subset uses 78,840 examples per model (two epochs, tokens), covering both informal and technical PT-BR registers.
Hyperparameters were:
- AdamW optimizer ()
- Cosine learning rate decay with $0.05$ warmup ratio, peak LR
- Batch size $1$, gradient accumulation tuned for GPU saturation,
- bfloat16 mixed precision and activation checkpointing enabled
Fine-tuning was executed on AWS p5.48xlarge instances (8x NVIDIA H100 GPUs, 112 TFLOPS aggregate, 1,752 GPU-hours per run). Estimated compute: bfloat16 FLOPs.
3. Evaluation and Benchmarks
Evaluation utilized both intrinsic and extrinsic metrics.
3.1 Intrinsic Metrics
Although perplexity on held-out PT-BR corpora is not reported, typical tokenization achieves tokens/sentence in Portuguese.
3.2 Extrinsic Benchmarks
LM-Eval-Harness-PT was used, encompassing nine tasks across NLI, RTE, QA, national exam (ENEM), hate-speech, sentiment, and legal bar exam domains. Amadeus-Verbo Qwen2.5-7B was assessed in few-shot regimes (3–25 examples).
3.3 Comparative Performance
Selected task scores (F1 macro and Pearson) illustrate consistent improvement over the Qwen2.5-7B base model:
| Task | Metric | Qwen2.5-7B | Amadeus-Verbo-7B |
|---|---|---|---|
| bluex (RC) | F1 Macro | 0.64 | 0.65 |
| hate_speech | F1 Macro | 0.72 | 0.74 |
| assin2_rte | F1 Macro (RTE) | 0.94 | 0.94 |
| assin2_sts | Pearson (STS) | 0.76 | 0.78 |
A plausible implication is that targeted PT-BR instruction tuning yields 1–5 point lifts in STS, hate-speech, and RC over the Qwen2.5 base on Portuguese tasks (Cruz-Castañeda et al., 20 May 2025).
4. Use Cases and Limitations
4.1 Recommended Applications
- Instruction following: specialized chatbots and virtual assistants in Brazilian Portuguese
- Document understanding: question answering and summarization of PT-BR news, law, and educational texts
- Content moderation: hate-speech/offensive-language detection
- Sentiment analysis/social media monitoring
- Legal/educational: OAB bar exam, ENEM question answering
4.2 Failure Modes
- Occasional hallucinations when context is insufficient or domain is out-of-distribution
- Sensitivity to prompt phrasing; subtle instructions can be ignored
- Degraded performance beyond 8K-token context or for rare dialects
4.3 Mitigation Strategies
- Retrieval-augmented generation (RAG) with trusted knowledge bases
- Standardized instruction-prompt templates (prompt engineering)
- Fact-checking modules or small high-precision classifiers
- Continued SFT using domain-specific corpora
5. Context within the Qwen2.5 Ecosystem
Amadeus-Verbo Qwen2.5-7B occupies the 7B parameter scale within the Amadeus-Verbo and Qwen2.5 families. It inherits architectural features, pre-training regimes, and optimization procedures from Qwen2.5, distinguishing itself through targeted supervised fine-tuning for PT-BR. It facilitates applications in resource-constrained environments due to its efficient footprint and competitive accuracy. Amadeus-Verbo Qwen2.5-7B is distributed via HuggingFace and represents a blueprint for rapid LLM development in other low-resource languages when sufficient native data is available (Cruz-Castañeda et al., 20 May 2025).
6. Model Accessibility and Deployment
The model and all Amadeus-Verbo variants (spanning 0.5B to 72B parameters) are available through HuggingFace Collections, with open-source weights, configuration files, and documentation (Cruz-Castañeda et al., 20 May 2025). The design supports deployment both on GPU and advanced CPU/edge devices, contingent on model quantization and hardware capability. Continued application-specific fine-tuning is achievable with existing workflows, and base models are suited to serving as foundational layers for more specialized domain adaptation.
7. Common Misconceptions and Clarifications
No direct side-by-side comparisons with LLaMA-2-7B or Qwen2.0-7B are reported in the technical references for Amadeus-Verbo Qwen2.5-7B. The incremental gains over the Qwen2.5 base should be interpreted within the context of extrinsic PT-BR benchmarks and not generalized to other unrelated tasks. Furthermore, the architecture does not introduce new modules, sparse experts, or bespoke inference optimizations; all improvements derive from careful data curation and instruction tuning.
Amadeus-Verbo Qwen2.5-7B demonstrates the feasibility of leveraging established LLM architectures for high-accuracy, resource-efficient Brazilian Portuguese NLP applications through systematic fine-tuning and domain-specific evaluation.