MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tasks (2312.15960v3)
Abstract: LLMs have showcased impressive capabilities in handling straightforward programming tasks. However, their performance tends to falter when confronted with more challenging programming problems. We observe that conventional models often generate solutions as monolithic code blocks, restricting their effectiveness in tackling intricate questions. To overcome this limitation, we present Modular-of-Thought Coder (MoTCoder). We introduce a pioneering framework for MoT instruction tuning, designed to promote the decomposition of tasks into logical sub-tasks and sub-modules. Our investigations reveal that, through the cultivation and utilization of sub-modules, MoTCoder significantly improves both the modularity and correctness of the generated solutions, leading to substantial relative pass@1 improvements of 12.9% on APPS and 9.43% on CodeContests. Our codes are available at https://github.com/dvlab-research/MoTCoder.
- Palm 2 technical report. CoRR, abs/2305.10403, 2023. doi: 10.48550/arXiv.2305.10403. URL https://doi.org/10.48550/arXiv.2305.10403.
- Ext5: Towards extreme multi-task scaling for transfer learning. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. URL https://openreview.net/forum?id=Vzh1BFUCiIX.
- Program synthesis with large language models. CoRR, abs/2108.07732, 2021. URL https://arxiv.org/abs/2108.07732.
- Gpt-neo: Large scale autoregressive language modeling with mesh-tensorflow. URL https://doi. org/10.5281/zenodo, 5297715, 2021.
- Gpt-neox-20b: An open-source autoregressive language model. CoRR, abs/2204.06745, 2022. doi: 10.48550/arXiv.2204.06745. URL https://doi.org/10.48550/arXiv.2204.06745.
- Language models are few-shot learners. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
- Codet: Code generation with generated tests. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=ktrw68Cmu9c.
- Evaluating large language models trained on code. CoRR, abs/2107.03374, 2021a. URL https://arxiv.org/abs/2107.03374.
- Evaluating large language models trained on code. CoRR, abs/2107.03374, 2021b. URL https://arxiv.org/abs/2107.03374.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021c.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://vicuna.lmsys.org.
- Palm: Scaling language modeling with pathways. CoRR, abs/2204.02311, 2022.
- Scaling instruction-finetuned language models. CoRR, abs/2210.11416, 2022. doi: 10.48550/arXiv.2210.11416. URL https://doi.org/10.48550/arXiv.2210.11416.
- Incoder: A generative model for code infilling and synthesis. CoRR, abs/2204.05999, 2022. doi: 10.48550/arXiv.2204.05999. URL https://doi.org/10.48550/arXiv.2204.05999.
- Measuring coding challenge competence with APPS. In NeurIPS Datasets and Benchmarks, 2021a.
- Measuring coding challenge competence with apps. NeurIPS, 2021b.
- Training compute-optimal large language models. CoRR, abs/2203.15556, 2022. doi: 10.48550/arXiv.2203.15556. URL https://doi.org/10.48550/arXiv.2203.15556.
- Unifiedqa: Crossing format boundaries with a single QA system. In Cohn, T., He, Y., and Liu, Y. (eds.), Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, volume EMNLP 2020 of Findings of ACL, pp. 1896–1907. Association for Computational Linguistics, 2020. doi: 10.18653/v1/2020.findings-emnlp.171. URL https://doi.org/10.18653/v1/2020.findings-emnlp.171.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
- Coderl: Mastering code generation through pretrained models and deep reinforcement learning. Advances in Neural Information Processing Systems, 35:21314–21328, 2022.
- Codechain: Towards modular code generation through chain of self-revisions with representative sub-modules. arXiv preprint arXiv:2310.08992, 2023.
- Starcoder: may the source be with you! arXiv preprint arXiv:2305.06161, 2023.
- Competition-level code generation with alphacode. arXiv preprint arXiv:2203.07814, 2022a.
- Competition-level code generation with alphacode. CoRR, abs/2203.07814, 2022b. doi: 10.48550/arXiv.2203.07814. URL https://doi.org/10.48550/arXiv.2203.07814.
- Competition-level code generation with AlphaCode. arXiv:abs/2203.07814, 2022c.
- Wizardcoder: Empowering code large language models with evol-instruct. arXiv preprint arXiv:2306.08568, 2023a.
- Wizardcoder: Empowering code large language models with evol-instruct, 2023b.
- Toward automatic program synthesis. Commun. ACM, 14(3):151–165, mar 1971. ISSN 0001-0782. doi: 10.1145/362566.362568. URL https://doi.org/10.1145/362566.362568.
- Microsoft. Azure openai service models. https://learn.microsoft.com/en-us/azure/cognitive-services/openai/concepts/models, 2023.
- Codegen: An open large language model for code with multi-turn program synthesis. In The Eleventh International Conference on Learning Representations, 2023.
- Demystifying gpt self-repair for code generation. arXiv preprint arXiv:2306.09896, 2023.
- OpenAI. GPT-4 technical report. CoRR, abs/2303.08774, 2023. doi: 10.48550/arXiv.2303.08774. URL https://doi.org/10.48550/arXiv.2303.08774.
- Training language models to follow instructions with human feedback. In NeurIPS, 2022.
- Scaling language models: Methods, analysis & insights from training gopher. CoRR, abs/2112.11446, 2021. URL https://arxiv.org/abs/2112.11446.
- Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67, 2020.
- Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950, 2023.
- Multitask prompted training enables zero-shot task generalization. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. URL https://openreview.net/forum?id=9Vrb9D0WI4.
- Reflexion: Language agents with verbal reinforcement learning, 2023.
- Stanford alpaca: An instruction-following llama model, 2023.
- Unifying language learning paradigms. CoRR, abs/2205.05131, 2022. doi: 10.48550/arXiv.2205.05131. URL https://doi.org/10.48550/arXiv.2205.05131.
- Llama: Open and efficient foundation language models. CoRR, abs/2302.13971, 2023. doi: 10.48550/arXiv.2302.13971. URL https://doi.org/10.48550/arXiv.2302.13971.
- GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax, May 2021.
- Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In EMNLP (1), pp. 8696–8708. Association for Computational Linguistics, 2021.
- Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560, 2022.
- Codet5+: Open code large language models for code understanding and generation. arXiv preprint arXiv:2305.07922, 2023a.
- Codet5+: Open code large language models for code understanding and generation. CoRR, abs/2305.07922, 2023b. doi: 10.48550/arXiv.2305.07922. URL https://doi.org/10.48550/arXiv.2305.07922.
- Finetuned language models are zero-shot learners. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022a. URL https://openreview.net/forum?id=gEZrGCozdqR.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022b.
- Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244, 2023.
- Zeroprompt: Scaling prompt-based pretraining to 1, 000 tasks improves zero-shot generalization. In Goldberg, Y., Kozareva, Z., and Zhang, Y. (eds.), Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pp. 4235–4252. Association for Computational Linguistics, 2022. URL https://aclanthology.org/2022.findings-emnlp.312.
- GLM-130B: an open bilingual pre-trained model. CoRR, abs/2210.02414, 2022. doi: 10.48550/arXiv.2210.02414. URL https://doi.org/10.48550/arXiv.2210.02414.
- Self-edit: Fault-aware code editor for code generation. arXiv preprint arXiv:2305.04087, 2023.
- OPT: open pre-trained transformer language models. CoRR, abs/2205.01068, 2022. doi: 10.48550/arXiv.2205.01068. URL https://doi.org/10.48550/arXiv.2205.01068.
- Codegeex: A pre-trained model for code generation with multilingual evaluations on humaneval-x. CoRR, abs/2303.17568, 2023. doi: 10.48550/arXiv.2303.17568. URL https://doi.org/10.48550/arXiv.2303.17568.
- Least-to-most prompting enables complex reasoning in large language models. In The Eleventh International Conference on Learning Representations, 2023.
- Jingyao Li (18 papers)
- Pengguang Chen (20 papers)
- Jiaya Jia (162 papers)
- Bin Xia (56 papers)
- Hong Xu (70 papers)