Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task (2401.02909v1)

Published 5 Jan 2024 in cs.CL

Abstract: LLMs are increasingly bringing advances to Natural Language Processing. However, low-resource languages, those lacking extensive prominence in datasets for various NLP tasks, or where existing datasets are not as substantial, such as Portuguese, already obtain several benefits from LLMs, but not to the same extent. LLMs trained on multilingual datasets normally struggle to respond to prompts in Portuguese satisfactorily, presenting, for example, code switching in their responses. This work proposes a fine-tuned LLaMA 2-based model for Portuguese prompts named Bode in two versions: 7B and 13B. We evaluate the performance of this model in classification tasks using the zero-shot approach with in-context learning, and compare it with other LLMs. Our main contribution is to bring an LLM with satisfactory results in the Portuguese language, as well as to provide a model that is free for research or commercial purposes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Gqa: Training generalized multi-query transformer models from multi-head checkpoints.
  2. Falcon-40B: an open large language model with state-of-the-art performance. Unpublished.
  3. Longformer: The long-document transformer.
  4. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow. If you use this software, please cite it using these metadata.
  5. Language models are few-shot learners.
  6. Henrico Bertini Brum and Maria das Graças Volpe Nunes. 2017. Building a sentiment corpus of tweets in Brazilian Portuguese. arXiv preprint arXiv:1712.08917.
  7. Palm: Scaling language modeling with pathways.
  8. Flashattention: Fast and memory-efficient exact attention with io-awareness.
  9. Bert: Pre-training of deep bidirectional transformers for language understanding.
  10. Fakerecogna: A new brazilian corpus for fake news detection. In International Conference on Computational Processing of the Portuguese Language, pages 57–67. Springer.
  11. Xinyang Geng and Hao Liu. 2023. Openllama: An open reproduction of llama.
  12. Lora: Low-rank adaptation of large language models.
  13. Mistral 7b.
  14. Cabrita: closing the gap for foreign languages.
  15. Clueweb22: 10 billion web documents with rich information. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, page 3360–3362, New York, NY, USA. Association for Computing Machinery.
  16. The refinedweb dataset for falcon llm: Outperforming curated corpora with web data, and web data only.
  17. Sabiá: Portuguese large language models.
  18. Noam Shazeer. 2019. Fast transformer decoding: One write-head is all you need.
  19. Noam Shazeer. 2020. Glu variants improve transformer.
  20. Roformer: Enhanced transformer with rotary position embedding.
  21. A survey of zero shot detection: methods and applications. Cognitive Robotics, 1:159–167.
  22. Llama: Open and efficient foundation language models.
  23. Llama 2: Open foundation and fine-tuned chat models.
  24. Attention is all you need.
  25. Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
  26. mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online. Association for Computational Linguistics.
  27. Biao Zhang and Rico Sennrich. 2019. Root mean square layer normalization.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
Citations (10)
Youtube Logo Streamline Icon: https://streamlinehq.com