LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language (2312.09993v1)
Abstract: LLMs represent state-of-the-art linguistic models designed to equip computers with the ability to comprehend natural language. With its exceptional capacity to capture complex contextual relationships, the LLaMA (LLM Meta AI) family represents a novel advancement in the field of natural language processing by releasing foundational models designed to improve the natural language understanding abilities of the transformer architecture thanks to their large amount of trainable parameters (7, 13, and 70 billion parameters). In many natural language understanding tasks, these models obtain the same performances as private company models such as OpenAI Chat-GPT with the advantage to make publicly available weights and code for research and commercial uses. In this work, we investigate the possibility of Language Adaptation for LLaMA models, explicitly focusing on addressing the challenge of Italian Language coverage. Adopting an open science approach, we explore various tuning approaches to ensure a high-quality text generated in Italian suitable for common tasks in this underrepresented language in the original models' datasets. We aim to release effective text generation models with strong linguistic properties for many tasks that seem challenging using multilingual or general-purpose LLMs. By leveraging an open science philosophy, this study contributes to Language Adaptation strategies for the Italian language by introducing the novel LLaMAntino family of Italian LLMs.
- Towards a cleaner document-oriented multilingual crawled corpus. arXiv preprint arXiv:2201.06642 (2022).
- Large language models in machine translation. (2007).
- Parsing with multilingual BERT, a small corpus, and a small treebank. arXiv preprint arXiv:2009.14124 (2020).
- Monojit Choudhury. 2023. Generative AI has a language problem. Nature Human Behaviour (2023), 1–2.
- Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314 (2023).
- Enhancing Chat Language Models by Scaling High-quality Instructional Conversations. arXiv preprint arXiv:2305.14233 (2023).
- Free Dolly. 2023. Introducing the World’s First Truly Open Instruction-Tuned LLM. databricks. com.
- Yunhui Guo. 2018. A survey on methods and theories of quantized neural networks. arXiv preprint arXiv:1808.04752 (2018).
- ExtremITA at EVALITA 2023: Multi-Task Sustainable Scaling to Large Language Models at its Extreme. In Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2023), Vol. 3473. CEUR.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
- LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models. arXiv preprint arXiv:2304.01933 (2023).
- Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351 (2019).
- Evalita 2023: Overview of the 8th evaluation campaign of natural language processing and speech tools for italian. In Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2023), CEUR. org, Parma, Italy.
- EVALITA 2023: Overview of the 8th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. In Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2023), Vol. 3473. CEUR.
- LLM-FP4: 4-Bit Floating-Point Quantized Transformers. arXiv preprint arXiv:2310.16836 (2023).
- Summary of chatgpt-related research and perspective towards the future of large language models. Meta-Radiology (2023), 100017.
- LLM-QAT: Data-Free Quantization Aware Training for Large Language Models. arXiv:2305.17888 [cs.CL]
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
- Alberto: Italian BERT language understanding model for NLP challenging tasks based on tweets. In CEUR Workshop Proceedings, Vol. 2481. CEUR, 1–6.
- Gabriele Sarti and Malvina Nissim. 2022. It5: Large-scale text-to-text pretraining for italian language understanding and generation. arXiv preprint arXiv:2203.03759 (2022).
- Understanding the capabilities, limitations, and societal impact of large language models. arXiv preprint arXiv:2102.02503 (2021).
- Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- K-adapter: Infusing knowledge into pre-trained models with adapters. arXiv preprint arXiv:2002.01808 (2020).
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682 (2022).
- Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019).
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100 (2022).
- Bloom+ 1: Adding language support to bloom for zero-shot prompting. arXiv preprint arXiv:2212.09535 (2022).
- A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).
- Pierpaolo Basile (7 papers)
- Elio Musacchio (3 papers)
- Marco Polignano (5 papers)
- Lucia Siciliani (3 papers)
- Giuseppe Fiameni (18 papers)
- Giovanni Semeraro (5 papers)