Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning (2307.02053v1)
Abstract: Recently, the release of INSTRUCTEVAL has provided valuable insights into the performance of LLMs that utilize encoder-decoder or decoder-only architecture. Interestingly, despite being introduced four years ago, T5-based LLMs, such as FLAN-T5, continue to outperform the latest decoder-based LLMs, such as LLAMA and VICUNA, on tasks that require general problem-solving skills. This performance discrepancy can be attributed to three key factors: (1) Pre-training data, (2) Backbone architecture, and (3) Instruction dataset. In this technical report, our main focus is on investigating the impact of the third factor by leveraging VICUNA, a LLM based on LLAMA, which has undergone fine-tuning on ChatGPT conversations. To achieve this objective, we fine-tuned VICUNA using a customized instruction dataset collection called FLANMINI. This collection includes a subset of the large-scale instruction dataset known as FLAN, as well as various code-related datasets and conversational datasets derived from ChatGPT/GPT-4. This dataset comprises a large number of tasks that demand problem-solving skills. Our experimental findings strongly indicate that the enhanced problem-solving abilities of our model, FLACUNA, are obtained through fine-tuning VICUNA on the FLAN dataset, leading to significant improvements across numerous benchmark datasets in INSTRUCTEVAL. FLACUNA is publicly available at https://huggingface.co/declare-lab/flacuna-13b-v1.0.
- Instructeval: Towards holistic evaluation of instruction-tuned large language models, 2023.
- Stanford alpaca: An instruction-following llama model, 2023. URL https://github.com/tatsu-lab/stanford_alpaca.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://vicuna.lmsys.org.
- Llama: Open and efficient foundation language models. ArXiv, abs/2302.13971, 2023.
- The flan collection: Designing data and methods for effective instruction tuning. arXiv preprint arXiv:2301.13688, 2023.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Competition-level code generation with AlphaCode. Science, 378(6624):1092–1097, dec 2022a. doi: 10.1126/science.abq1158. URL https://doi.org/10.1126%2Fscience.abq1158.
- Measuring coding challenge competence with apps. ArXiv, abs/2105.09938, 2021a.
- Codesearchnet challenge: Evaluating the state of semantic code search. ArXiv, abs/1909.09436, 2019a.
- CodeSearchNet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436, 2019b.
- Competition-level code generation with alphacode. arXiv preprint arXiv:2203.07814, 2022b.
- Measuring coding challenge competence with apps. NeurIPS, 2021b.
- Sahil Chaudhary. Code alpaca: An instruction-following llama model for code generation. https://github.com/sahil280114/codealpaca, 2023.
- Measuring massive multitask language understanding. In International Conference on Learning Representations, 2021c. URL https://openreview.net/forum?id=d7KBjmI3GmQ.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models, 2022.
- Challenging big-bench tasks and whether chain-of-thought can solve them. ArXiv, abs/2210.09261, 2022.
- Evaluating large language models trained on code. ArXiv, abs/2107.03374, 2021.
- A general language assistant as a laboratory for alignment, 2021.
- Judging llm-as-a-judge with mt-bench and chatbot arena, 2023.
- Full parameter fine-tuning for large language models with limited resources, 2023.
- Deepanway Ghosal (33 papers)
- Yew Ken Chia (24 papers)
- Navonil Majumder (48 papers)
- Soujanya Poria (138 papers)