Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Advancing Generative AI for Portuguese with Open Decoder Gervásio PT* (2402.18766v2)

Published 29 Feb 2024 in cs.CL

Abstract: To advance the neural decoding of Portuguese, in this paper we present a fully open Transformer-based, instruction-tuned decoder model that sets a new state of the art in this respect. To develop this decoder, which we named Gerv\'asio PT*, a strong LLaMA~2 7B model was used as a starting point, and its further improvement through additional training was done over language resources that include new instruction data sets of Portuguese prepared for this purpose, which are also contributed in this paper. All versions of Gerv\'asio are open source and distributed for free under an open license, including for either research or commercial usage, and can be run on consumer-grade hardware, thus seeking to contribute to the advancement of research and innovation in language technology for Portuguese.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. BLUEX: A benchmark based on Brazilian leading universities entrance exams. arXiv preprint arXiv:2307.05410.
  2. Pythia: A suite for analyzing large language models across training and scaling. arXiv preprint arXiv:2304.01373.
  3. Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901.
  4. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
  5. Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems, 35:16344–16359.
  6. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
  7. Accelerate: Training and inference at scale made simple, efficient and adaptable. https://github.com/huggingface/accelerate.
  8. DeBERTa: Decoding-enhanced BERT with disentangled attention. In International Conference on Learning Representations.
  9. OPT-IML: Scaling language model instruction meta learning through the lens of generalization. arXiv preprint arXiv:2212.12017.
  10. Mistral 7B. arXiv preprint arXiv:2310.06825.
  11. Scaling laws for neural language models.
  12. Taku Kudo and John Richardson. 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71.
  13. Cabrita: closing the gap for foreign languages.
  14. Few-shot learning with multilingual generative language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9019–9052, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  15. Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8:726–742.
  16. Evaluating GPT-3.5 and GPT-4 models on Brazilian university admission exams. arXiv preprint arXiv:2303.17003.
  17. OpenAI. 2023. GPT-4 technical report. arXiv preprint arXiv:2303.08774.
  18. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  19. Sabiá: Portuguese large language models. In Intelligent Systems, pages 226–240, Cham. Springer Nature Switzerland.
  20. How multilingual is multilingual BERT? arXiv preprint arXiv:1906.01502.
  21. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21:5485–5551.
  22. Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3505–3506.
  23. The ASSIN 2 shared task: A quick overview. In Computational Processing of the Portuguese Language, pages 406–412, Cham. Springer International Publishing.
  24. Advancing neural encoding of Portuguese with Transformer Albertina PT-*. In Progress in Artificial Intelligence.
  25. FaQuAD: Reading comprehension dataset in the domain of Brazilian higher education. In 2019 8th Brazilian Conference on Intelligent Systems (BRACIS), pages 443–448.
  26. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
  27. mGPT: Few-shot learners go multilingual. arXiv preprint arXiv:2204.07580.
  28. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  29. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  30. Attention is all you need. Advances in Neural Information Processing Systems, 30.
  31. Superglue: A stickier benchmark for general-purpose language understanding systems. Advances in neural information processing systems, 32.
  32. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355.
  33. Ben Wang. 2021. Mesh-Transformer-JAX: Model-parallel implementation of transformer language model with JAX. https://github.com/kingoflolz/mesh-transformer-jax.
  34. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
  35. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45.
  36. ByT5: Towards a token-free future with pre-trained byte-to-byte models. Transactions of the Association for Computational Linguistics, 10:291–306.
  37. mT5: A massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934.
  38. Thales Sales Almeida and Thiago Laitz and Giovana K. Bonás and Rodrigo Nogueira. 2023. BLUEX: A benchmark based on Brazilian Leading Universities Entrance eXams. HuggingFace.
  39. Desnes Nunes and Ricardo Primi and Ramon Pires and Roberto Lotufo and Rodrigo Nogueira. 2023. ENEM 2022 (Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams). GitHub.
  40. ASSIN 2 (The ASSIN 2 Shared Task: A Quick Overview). HuggingFace.
  41. FaQuAD: Reading Comprehension Dataset in the Domain of Brazilian Higher Education. HuggingFace.
  42. Llama 2 7B (Llama 2: Open foundation and fine-tuned chat models). HuggingFace.
  43. Superglue: A stickier benchmark for general-purpose language understanding systems. HuggingFace.
  44. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. HuggingFace.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Rodrigo Santos (10 papers)
  2. João Silva (10 papers)
  3. Luís Gomes (7 papers)
  4. João Rodrigues (17 papers)
  5. António Branco (14 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.