Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Breeze-7B Technical Report (2403.02712v2)

Published 5 Mar 2024 in cs.CL

Abstract: Breeze-7B is an open-source LLM based on Mistral-7B, designed to address the need for improved language comprehension and chatbot-oriented capabilities in Traditional Chinese. This technical report provides an overview of the additional pretraining, finetuning, and evaluation stages for the Breeze-7B model. The Breeze-7B family of base and chat models exhibits good performance on language comprehension and chatbot-oriented tasks, reaching the top in several benchmarks among models comparable in its complexity class.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. The Falcon series of open language models, 2023.
  2. epfLLM Megatron-LLM, 2023. URL https://github.com/epfLLM/Megatron-LLM.
  3. Extending the pre-training of bloom for improved support of traditional chinese: Models, methods and results, 2023.
  4. Google Gemini-Team. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. 2024.
  5. Textbooks are all you need, June 2023. URL https://www.microsoft.com/en-us/research/publication/textbooks-are-all-you-need/.
  6. Advancing the evaluation of traditional chinese language models: Towards a comprehensive benchmark suite, 2023.
  7. Mistral 7B, 2023.
  8. Textbooks are all you need ii: phi-1.5 technical report. September 2023. URL https://www.microsoft.com/en-us/research/publication/textbooks-are-all-you-need-ii-phi-1-5-technical-report/.
  9. Chipnemo: Domain-adapted llms for chip design. arXiv preprint arXiv:2311.00176, 2023.
  10. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
  11. OpenAI. GPT-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  12. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022.
  13. Neural machine translation of rare words with subword units. In Katrin Erk and Noah A. Smith, editors, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715–1725, Berlin, Germany, August 2016. Association for Computational Linguistics. doi: 10.18653/v1/P16-1162. URL https://aclanthology.org/P16-1162.
  14. Drcd: a chinese machine reading comprehension dataset. ArXiv, abs/1806.00920, 2018. URL https://api.semanticscholar.org/CorpusID:46932369.
  15. An improved traditional chinese evaluation suite for foundation model. arXiv, 2023.
  16. Llama 2: Open foundation and fine-tuned chat models, 2023.
  17. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36, 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Chan-Jan Hsu (16 papers)
  2. Chang-Le Liu (10 papers)
  3. Feng-Ting Liao (8 papers)
  4. Yi-Chang Chen (14 papers)
  5. Da-shan Shiu (27 papers)
  6. Po-chun Hsu (25 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.