Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models (2407.10953v3)

Published 15 Jul 2024 in cs.CL

Abstract: The Mutual Reinforcement Effect (MRE) represents a promising avenue in information extraction and multitasking research. Nevertheless, its applicability has been constrained due to the exclusive availability of MRE mix datasets in Japanese, thereby limiting comprehensive exploration by the global research community. To address this limitation, we introduce a Multilingual MRE mix dataset (MMM) that encompasses 21 sub-datasets in English, Japanese, and Chinese. In this paper, we also propose a method for dataset translation assisted by LLMs, which significantly reduces the manual annotation time required for dataset construction by leveraging LLMs to translate the original Japanese datasets. Additionally, we have enriched the dataset by incorporating open-domain Named Entity Recognition (NER) and sentence classification tasks. Utilizing this expanded dataset, we developed a unified input-output framework to train an Open-domain Information Extraction LLM (OIELLM). The OIELLM model demonstrates the capability to effectively process novel MMM datasets, exhibiting significant improvements in performance. The OIELLM model and datasets is open-source in HuggingFace: https://ganchengguang.github.io/MRE/

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Chengguang Gan (14 papers)
  2. Qingyu Yin (44 papers)
  3. Xinyang He (2 papers)
  4. Hanjun Wei (1 paper)
  5. Yunhao Liang (4 papers)
  6. Younghun Lim (1 paper)
  7. Shijian Wang (7 papers)
  8. Hexiang Huang (5 papers)
  9. Qinghao Zhang (13 papers)
  10. Shiwen Ni (34 papers)
  11. Tatsunori Mori (14 papers)
  12. Sunbowen Lee (6 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com