Papers
Topics
Authors
Recent
Search
2000 character limit reached

Amadeus-Verbo Qwen2.5-7B for PT-BR NLP

Updated 30 January 2026
  • Amadeus-Verbo Qwen2.5-7B is a large-scale, 7B-parameter autoregressive language model fine-tuned specifically for Brazilian Portuguese using transfer learning and targeted instruction.
  • It employs a transformer decoder with 32 layers, 4,096-dimensional embeddings, and Rotary Position Embeddings to deliver competitive performance across tasks like hate-speech detection and legal reasoning.
  • The model is publicly available on HuggingFace, supporting diverse applications including chatbots, content moderation, and document understanding in educational, legal, and governmental domains.

Amadeus-Verbo Qwen2.5-7B is a LLM specifically trained for Brazilian Portuguese, built upon the technical foundations of the Qwen2.5-7B architecture and released as part of the Amadeus-Verbo family of models. Developed to democratize and accelerate open-source Brazilian Portuguese NLP research, this model exploits transfer learning and targeted instruction fine-tuning to deliver competitive performance across a broad set of academic, industrial, and governmental benchmarks. Amadeus-Verbo Qwen2.5-7B features a transformer decoder backbone with approximately seven billion parameters, tuned for both general and instruction-following capacities in PT-BR. The model is publicly available for research and development via HuggingFace (Cruz-Castañeda et al., 20 May 2025).

1. Model Architecture and Scaling

Amadeus-Verbo Qwen2.5-7B employs an autoregressive transformer-decoder topology consistent with Qwen2.5-7B. The backbone comprises 32 layers, a model (hidden) dimension of 4,096, a feed-forward inner dimension of 16,384, and 32 attention heads. Each attention head has a dimension of dhead=d/h=128d_{head} = d / h = 128. The architecture incorporates Rotary Position Embeddings (RoPE) for positional encoding and enables a maximum context window of 8,192 tokens during fine-tuning.

Parameter count in this configuration is computed via:

Ntotal≈Nembed+L⋅(Nself−attn+Nffn+Nnorms)N_{total} \approx N_{embed} + L \cdot (N_{self-\mathrm{attn}} + N_{ffn} + N_{norms})

with approximate scaling law:

Ntotal≃12⋅L⋅d2N_{total} \simeq 12 \cdot L \cdot d^2

where L=32L=32, d=4,096d=4,096, yielding Ntotal≃6.4×109N_{total} \simeq 6.4 \times 10^9 parameters. Vocabulary size is ∼150,000\sim 150,000 tokens (SentencePiece), with byte-pair merges optimized for UTF-8 encoded text.

Weights and activations are stored in bfloat16 format during fine-tuning. The 8K context window imposes O(Lâ‹…N2)O(L \cdot N^2) memory complexity for attention states.

2. Pretraining and Portuguese-Specific Fine-Tuning

2.1 Pretraining

Amadeus-Verbo Qwen2.5-7B builds upon the Qwen2.5 base model, initially pretrained on approximately two trillion tokens spanning multilingual web-crawl, code repositories, conversational datasets, scientific literature, news, and social media (Qwen et al., 2024). Tokenization leverages SentencePiece with a ∼150\sim150k-vocabulary. The pre-training objective is next-token prediction via standard autoregressive cross-entropy.

2.2 Supervised Fine-Tuning in Brazilian Portuguese

Fine-tuning on PT-BR consists of approximately 600,000 unique instruction–response pairs covering educational prompts, classification, QA, summarization, code explanation, and other knowledge domains. Data was filtered for quality, with explicit removal of profanity and hate speech, formatted into a four-piece prompt template per (instruction, input, output, text). The final SFT subset uses 78,840 examples per model (two epochs, max_length=8,192\text{max\_length}=8,192 tokens), covering both informal and technical PT-BR registers.

Hyperparameters were:

  • AdamW optimizer (β1=0.9, β2=0.95, ϵ=1×10−8, weight_decay=0.01\beta_1=0.9,\ \beta_2=0.95,\ \epsilon=1\times10^{-8},\ \text{weight\_decay}=0.01)
  • Cosine learning rate decay with $0.05$ warmup ratio, peak LR 1×10−51\times 10^{-5}
  • Batch size $1$, gradient accumulation tuned for GPU saturation, max_grad_norm=1.0\text{max\_grad\_norm}=1.0
  • bfloat16 mixed precision and activation checkpointing enabled

Fine-tuning was executed on AWS p5.48xlarge instances (8x NVIDIA H100 GPUs, ∼\sim112 TFLOPS aggregate, 1,752 GPU-hours per run). Estimated compute: ∼1.2×1018\sim 1.2 \times 10^{18} bfloat16 FLOPs.

3. Evaluation and Benchmarks

Evaluation utilized both intrinsic and extrinsic metrics.

3.1 Intrinsic Metrics

Although perplexity on held-out PT-BR corpora is not reported, typical tokenization achieves ∼4\sim 4 tokens/sentence in Portuguese.

3.2 Extrinsic Benchmarks

LM-Eval-Harness-PT was used, encompassing nine tasks across NLI, RTE, QA, national exam (ENEM), hate-speech, sentiment, and legal bar exam domains. Amadeus-Verbo Qwen2.5-7B was assessed in few-shot regimes (3–25 examples).

3.3 Comparative Performance

Selected task scores (F1 macro and Pearson) illustrate consistent improvement over the Qwen2.5-7B base model:

Task Metric Qwen2.5-7B Amadeus-Verbo-7B
bluex (RC) F1 Macro 0.64 0.65
hate_speech F1 Macro 0.72 0.74
assin2_rte F1 Macro (RTE) 0.94 0.94
assin2_sts Pearson (STS) 0.76 0.78

A plausible implication is that targeted PT-BR instruction tuning yields 1–5 point lifts in STS, hate-speech, and RC over the Qwen2.5 base on Portuguese tasks (Cruz-Castañeda et al., 20 May 2025).

4. Use Cases and Limitations

  • Instruction following: specialized chatbots and virtual assistants in Brazilian Portuguese
  • Document understanding: question answering and summarization of PT-BR news, law, and educational texts
  • Content moderation: hate-speech/offensive-language detection
  • Sentiment analysis/social media monitoring
  • Legal/educational: OAB bar exam, ENEM question answering

4.2 Failure Modes

  • Occasional hallucinations when context is insufficient or domain is out-of-distribution
  • Sensitivity to prompt phrasing; subtle instructions can be ignored
  • Degraded performance beyond 8K-token context or for rare dialects

4.3 Mitigation Strategies

  • Retrieval-augmented generation (RAG) with trusted knowledge bases
  • Standardized instruction-prompt templates (prompt engineering)
  • Fact-checking modules or small high-precision classifiers
  • Continued SFT using domain-specific corpora

5. Context within the Qwen2.5 Ecosystem

Amadeus-Verbo Qwen2.5-7B occupies the 7B parameter scale within the Amadeus-Verbo and Qwen2.5 families. It inherits architectural features, pre-training regimes, and optimization procedures from Qwen2.5, distinguishing itself through targeted supervised fine-tuning for PT-BR. It facilitates applications in resource-constrained environments due to its efficient footprint and competitive accuracy. Amadeus-Verbo Qwen2.5-7B is distributed via HuggingFace and represents a blueprint for rapid LLM development in other low-resource languages when sufficient native data is available (Cruz-Castañeda et al., 20 May 2025).

6. Model Accessibility and Deployment

The model and all Amadeus-Verbo variants (spanning 0.5B to 72B parameters) are available through HuggingFace Collections, with open-source weights, configuration files, and documentation (Cruz-Castañeda et al., 20 May 2025). The design supports deployment both on GPU and advanced CPU/edge devices, contingent on model quantization and hardware capability. Continued application-specific fine-tuning is achievable with existing workflows, and base models are suited to serving as foundational layers for more specialized domain adaptation.

7. Common Misconceptions and Clarifications

No direct side-by-side comparisons with LLaMA-2-7B or Qwen2.0-7B are reported in the technical references for Amadeus-Verbo Qwen2.5-7B. The incremental gains over the Qwen2.5 base should be interpreted within the context of extrinsic PT-BR benchmarks and not generalized to other unrelated tasks. Furthermore, the architecture does not introduce new modules, sparse experts, or bespoke inference optimizations; all improvements derive from careful data curation and instruction tuning.

Amadeus-Verbo Qwen2.5-7B demonstrates the feasibility of leveraging established LLM architectures for high-accuracy, resource-efficient Brazilian Portuguese NLP applications through systematic fine-tuning and domain-specific evaluation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Amadeus-Verbo Qwen2.5-7B.