Sunflower 14B and 32B: Ugandan Language Models

Updated 10 October 2025

Sunflower 14B and 32B are instruction-tuned Qwen-3 models designed for high-accuracy translation across 31 Ugandan languages.
They employ LoRA-based fine-tuning and reinforcement learning with Direct Preference Optimization to reduce hallucinations and improve cultural context handling.
Applications include government, healthcare, education, and community services, effectively preserving cultural nuances and bridging digital language gaps.

Sunflower 14B and 32B refer to a pair of instruction-tuned LLMs developed to achieve state-of-the-art comprehension and practical utility across the major Ugandan languages. Their distinguishing feature is a rigorous regional focus, which shapes both the technical approach and the composition of the training data. Both models employ the Qwen-3 transformer architecture and are open-sourced to facilitate deployment in high-impact, multilingual settings such as government, healthcare, and education.

1. Model Architecture and Instruction Fine-Tuning

The Sunflower models exist in two parameter scales—14 billion (14B) and 32 billion (32B)—with the latter generally yielding higher translation accuracy for low-resource languages. Both architectures are derived from Qwen-3 and utilize standard transformer mechanisms.

Instruction fine-tuning is performed to adapt the models for a wide range of tasks, including:

Bidirectional translation (xx→eng, eng→xx) across 31 Ugandan languages
Context-sensitive question answering and summarization
Culturally-specific and creative queries

Supervised fine-tuning incorporates LoRA (Low-Rank Adaptation) with rank 16, applied exclusively to the response-side tokens in chat data. This design reduces VRAM requirements and acts as an additional regularization mechanism, discouraging instruction echoing and other artifacts common in multi-turn conversation modeling.

A subsequent reinforcement learning phase applies a variant of Direct Preference Optimization (Iterative Reasoning Preference Optimization), in which each prompt is paired with both preferred and dispreferred candidate completions. The DPO loss is augmented (mixing parameter $\alpha_\mathrm{RPO}=1.0$ ) to support complex preference ranking, with specific targeting of repetitive glitch loops and hallucinations that remained after initial fine-tuning.

2. Training Data Acquisition and Curation

The training data for Sunflower 14B and 32B is drawn from diverse and multimodal sources:

Digital corpora including MADLAD-400, FLORES-200, Makerere MT Corpus, and SALT (parallel datasets for Luganda, Acholi, etc.)
Web-scraped Ugandan news pieces, blogs, and community forums
Printed educational and literary materials, digitized via OCR, with normalization to correct for diacritics and scanning errors
Over 500 hours of talk-show/audio transcription produced by a fine-tuned Whisper-Large v3 for ten Ugandan languages
Community-sourced cultural documents, including folklore, proverbs, and phrasebooks

For languages with especially sparse resources, back-translation augmentation is utilized: synthetic examples are generated using an NLLB-based translation engine, which helps mitigate extreme data imbalance.

3. Regionally Focused Comprehension and Linguistic Transfer

The models are engineered to exploit the linguistic structure typical for the region—agglutinative morphology, shared phonetics, and interwoven cultural context—rather than attempting pan-African coverage.

This focus facilitates substantial cross-lingual transfer within the group. For example, the presence of shared grammatical constructs across many Ugandan languages yields denser coverage and improved performance even for dialects with few native training samples. The inclusion of oral, printed, and community-sourced data enables the models to process practical or culturally grounded queries, such as legal procedures, healthcare instructions, or local idioms.

Performance is evaluated on automatic translation metrics. For instance, Sunflower-32B attains a mean xx→eng chrF score of approximately $0.435$, outperforming generalist models (e.g., Gemini 2.5 Pro, GPT-4o) on test suites including BLEU, chrF, CER, and WER for 31 languages.

4. Practical Applications and Impact

Sunflower 14B and 32B address key needs:

Application Domain	Impact	Features
Machine Translation	Reduces barriers in government, healthcare, education	Bidirectional support for 31 languages; leading chrF performance
Community Access	Empowers users in civic, legal, commercial engagements	Handles culturally-grounded queries, proverbs, idioms
Preservation of Culture	Supports digitization and interaction with oral/printed forms	Handles non-standard, low-resource forms and folklore

Community-driven evaluation (in-person and online) is incorporated into the feedback loop, making the models responsive to native speaker concerns and deployment realities.

5. Developmental Challenges

Several significant challenges were encountered in constructing Sunflower 14B and 32B:

Scarcity of High-Quality Digital Data: Many languages required text digitization or transcription, resulting in OCR artifacts and difficulties with consistent normalization.
Resource Imbalance: Languages with fewer speakers and documents necessitated synthetic augmentation via back-translation, but balancing against overfitting to high-resource languages remained problematic.
Multidomain, Multilingual Robustness: The data mix across domains (news, education, informal conversation) forced the models to handle rapid context shifts and code-switching.
Mitigation of Glitching/Hallucination: Infinite loops and hallucinated outputs persisted despite instruction fine-tuning, leading to the adoption of preference optimization in the RL phase (e.g., DPO with mixing parameter control).

These issues are mitigated to various extents but are still active areas of research and development.

6. Evaluation and Metrics

Performance metrics are reported in terms of translation and comprehension, with comparisons to other open and proprietary LLMs:

Model	Mean xx→eng chrF	BLEU & Other Metrics	Domains Evaluated
Sunflower-32B	0.435	Highest in 31 langs	Gov, Health, Edu
Gemini 2.5 Pro	Lower	Varied	Generalist
GPT-4o	Lower	Varied	Generalist

One illustrative mathematical snippet registered in evaluation is:

$\text{Solving } x^2 + 2 = 6:$

Subtract $2$: $x^2 = 4$ Take square root: $x = \pm 2$ Such examples demonstrate both technical instructional capability and syntactic accuracy for mathematical content.

7. Future Directions and Limitations

A plausible implication is that the regionally focused paradigm seen in Sunflower 14B and 32B could be extended to other multi-language localities with similar sociolinguistic structures. Nonetheless, scalability to pan-African or global coverage is constrained by data availability, transfer learning limitations, and resource requirements.

Issues such as glitching, hallucination, and uneven language resource representation suggest the necessity for continued innovation in RL phase design, feedback integration, and synthetic data generation. Hallucination reduction and robust code-switching remain persistent challenges.

In summary, Sunflower 14B and 32B represent a comprehensive approach to localized language modeling, leveraging targeted data strategies and instruction-tuned architectures to set new performance standards for Ugandan languages—with broader implications for culturally-sensitive NLP and the design of multilingual LLMs (Akera et al., 8 Oct 2025).

PDF Markdown Chat (Pro)

References (1)

Sunflower: A New Approach To Expanding Coverage of African Languages in Large Language Models (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Sunflower 14B and 32B.