Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence Understanding (2308.10529v1)

Published 21 Aug 2023 in cs.CL

Abstract: LLMs have shown impressive ability for open-domain NLP tasks. However, LLMs are sometimes too footloose for natural language understanding (NLU) tasks which always have restricted output and input format. Their performances on NLU tasks are highly related to prompts or demonstrations and are shown to be poor at performing several representative NLU tasks, such as event extraction and entity typing. To this end, we present SeqGPT, a bilingual (i.e., English and Chinese) open-source autoregressive model specially enhanced for open-domain natural language understanding. We express all NLU tasks with two atomic tasks, which define fixed instructions to restrict the input and output format but still ``open'' for arbitrarily varied label sets. The model is first instruction-tuned with extremely fine-grained labeled data synthesized by ChatGPT and then further fine-tuned by 233 different atomic tasks from 152 datasets across various domains. The experimental results show that SeqGPT has decent classification and extraction ability, and is capable of performing language understanding tasks on unseen domains. We also conduct empirical studies on the scaling of data and model size as well as on the transfer across tasks. Our model is accessible at https://github.com/Alibaba-NLP/SeqGPT.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (15)
  1. Tianyu Yu (20 papers)
  2. Chengyue Jiang (11 papers)
  3. Chao Lou (8 papers)
  4. Shen Huang (25 papers)
  5. Xiaobin Wang (39 papers)
  6. Wei Liu (1135 papers)
  7. Jiong Cai (6 papers)
  8. Yangning Li (49 papers)
  9. Yinghui Li (65 papers)
  10. Kewei Tu (74 papers)
  11. Hai-Tao Zheng (94 papers)
  12. Ningyu Zhang (148 papers)
  13. Pengjun Xie (85 papers)
  14. Fei Huang (409 papers)
  15. Yong Jiang (194 papers)
Citations (12)

Summary

An Expert Review of "SeqGPT: An Out-of-the-box LLM for Open Domain Sequence Understanding"

The paper under review introduces SeqGPT, a LLM specifically designed for open-domain natural language understanding (NLU). Recognizing the limitations of existing LLMs in handling tasks with restricted input-output formats, the authors propose a bilingual, autoregressive model that uniquely handles open-domain sequence understanding. SeqGPT can execute a diverse range of NLU tasks by unifying them into two atomic tasks: extraction (EXT) and classification (CLS).

A significant contribution of the paper is the methodology employed to enhance SeqGPT's capabilities. The model is instruction-tuned with synthetic fine-grained labeled data generated by ChatGPT, and subsequently fine-tuned with a corpus spanning 233 atomic tasks from 152 datasets across various domains. This two-stage training procedure is aimed at instilling robust generalization skills before refining task-specific abilities in the model.

The authors report that SeqGPT outperforms ChatGPT by a substantial margin on a zero-shot NLU benchmark. Detailed evaluation indicates that augmenting model and data size results in performance improvement, with larger models demonstrating an ability to generalize across languages and tasks. The work further highlights that task diversity contributes more to performance enhancement than sheer data volume. This distinguishes their methodology from prevalent practices of scaling data volume, emphasizing a strategic approach favoring diversity.

SeqGPT's specification to operate with fixed input-output formats across tasks contributes to its versatility and showcases scalable capabilities even with limited resource setups. This positions SeqGPT as a capable solution for environments where large-scale computational resources for task-specific finetuning are constrained.

From a theoretical perspective, the paper sheds light on the importance of task diversity over volume. It aligns with findings from other instruction tuning research advocating broader task variety for improved generalization in LLMs. Practically, SeqGPT is potentially transformative for applications needing robust zero-shot performance across a multitude of domains, especially with its proficiency in both English and Chinese.

Future work can explore optimizing prompt templates, understanding the nuances of scaling effects, and exploring unsupervised techniques for data generation that maintain diversity without sacrificing quality. Additionally, coupling SeqGPT with domain-specific knowledge bases could enhance its applicability for specialized tasks.

Overall, the paper presents a comprehensive exploration of methods to enhance large-scale NLU models, successfully balancing efficiency and effectiveness and marking a step forward in unified approaches to sequence understanding in LLMs. This work provides a solid foundation for further research into versatile, multilingual, and open-domain LLMs.