Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Olapa-MCoT: Enhancing the Chinese Mathematical Reasoning Capability of LLMs (2312.17535v1)

Published 29 Dec 2023 in cs.AI, cs.CL, and cs.HC

Abstract: CoT (Chain-of-Thought) is a way to solve reasoning problems for LLMs . Recently, many researches appear for improving the CoT capability of LLMs. In this work, we also proposed Olapa-MCoT, which is a LLMs based on llama2-13B PLM for finetuning and alignment learning. During the alignment training, we proposed the SimRRHF algorithm and Incorrect Data Relearning and mainly focused on optimizing the Chinese mathematical reasoning ability of Olapa-MCoT. The experiment achieved significant results, with the accuracy of Chinese mathematical reasoning up to 50%, 36% rise compared to llama2-13B. In addition, the accuracy of English reasoning ability also increased by nearly 4%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. 2023. A 13b large language model developed by baichuan intelligent technology. https://github.com/baichuan-inc/Baichuan-13B.
  2. 2023a. Chatglm2-6b: An open bilingual chat llm. https://github.com/THUDM/ChatGLM2-6B.
  3. 2023b. Chatglm3 series: Open bilingual chat llms. https://github.com/THUDM/ChatGLM3.
  4. Chatgpt: Fundamentals, applications and social impacts. In 2022 Ninth International Conference on Social Networks Analysis, Management and Security (SNAMS), pages 1–8. IEEE.
  5. Palm 2 technical report. arXiv preprint arXiv:2305.10403.
  6. BELLEGroup. 2023. Belle: Be everyone’s large language model engine. https://github.com/LianjiaTech/BELLE.
  7. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113.
  8. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30.
  9. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  10. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
  11. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
  12. Let’s verify step by step. arXiv preprint arXiv:2305.20050.
  13. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786.
  14. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct. arXiv preprint arXiv:2308.09583.
  15. R OpenAI. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  16. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  17. The refinedweb dataset for falcon llm: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116.
  18. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  19. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021.
  20. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  21. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  22. Solving math word problems with process-and outcome-based feedback. arXiv preprint arXiv:2211.14275.
  23. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
  24. Rationale-augmented ensembles in language models. arXiv preprint arXiv:2207.00747.
  25. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
  26. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  27. Reframing human-ai collaboration for generating free-text explanations. arXiv preprint arXiv:2112.08674.
  28. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
  29. Olagpt: Empowering llms with human-like problem-solving abilities.
  30. Baichuan 2: Open large-scale language models. arXiv preprint arXiv:2309.10305.
  31. Scaling relationship on learning mathematical reasoning with large language models. arXiv preprint arXiv:2308.01825.
  32. Rrhf: Rank responses to align language models with human feedback without tears. arXiv preprint arXiv:2304.05302.
  33. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414.
  34. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
  35. Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493.
  36. Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, pages 12697–12706. PMLR.
  37. Progressive-hint prompting improves reasoning in large language models. arXiv preprint arXiv:2304.09797.
  38. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Shaojie Zhu (1 paper)
  2. Zhaobin Wang (1 paper)
  3. Chengxiang Zhuo (6 papers)
  4. Hui Lu (38 papers)
  5. Bo Hu (110 papers)
  6. Zang Li (15 papers)

Summary

We haven't generated a summary for this paper yet.