Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Raw Text is All you Need: Knowledge-intensive Multi-turn Instruction Tuning for Large Language Model (2407.03040v1)

Published 3 Jul 2024 in cs.CL and cs.AI

Abstract: Instruction tuning as an effective technique aligns the outputs of LLMs with human preference. But how to generate the seasonal multi-turn dialogues from raw documents for instruction tuning still requires further exploration. In this paper, we present a novel framework named R2S that leverages the CoD-Chain of Dialogue logic to guide LLMs in generating knowledge-intensive multi-turn dialogues for instruction tuning. By integrating raw documents from both open-source datasets and domain-specific web-crawled documents into a benchmark K-BENCH, we cover diverse areas such as Wikipedia (English), Science (Chinese), and Artifacts (Chinese). Our approach first decides the logic flow of the current dialogue and then prompts LLMs to produce key phrases for sourcing relevant response content. This methodology enables the creation of the G I NSTRUCT instruction dataset, retaining raw document knowledge within dialoguestyle interactions. Utilizing this dataset, we fine-tune GLLM, a model designed to transform raw documents into structured multi-turn dialogues, thereby injecting comprehensive domain knowledge into the SFT model for enhanced instruction tuning. This work signifies a stride towards refining the adaptability and effectiveness of LLMs in processing and generating more accurate, contextually nuanced responses across various fields.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Xia Hou (3 papers)
  2. Qifeng Li (46 papers)
  3. Jian Yang (503 papers)
  4. Tongliang Li (18 papers)
  5. Linzheng Chai (16 papers)
  6. Xianjie Wu (7 papers)
  7. Hangyuan Ji (4 papers)
  8. Zhoujun Li (122 papers)
  9. Jixuan Nie (1 paper)
  10. Jingbo Dun (1 paper)
  11. Wenfeng Song (3 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Youtube Logo Streamline Icon: https://streamlinehq.com