MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property (2402.16389v1)

Published 26 Feb 2024 in cs.CL and cs.AI

Abstract: LLMs have demonstrated impressive performance in various NLP tasks. However, there is limited understanding of how well LLMs perform in specific domains (e.g, the intellectual property (IP) domain). In this paper, we contribute a new benchmark, the first Multilingual-oriented quiZ on Intellectual Property (MoZIP), for the evaluation of LLMs in the IP domain. The MoZIP benchmark includes three challenging tasks: IP multiple-choice quiz (IPQuiz), IP question answering (IPQA), and patent matching (PatentMatch). In addition, we also develop a new IP-oriented multilingual LLM (called MoZi), which is a BLOOMZ-based model that has been supervised fine-tuned with multilingual IP-related text data. We evaluate our proposed MoZi model and four well-known LLMs (i.e., BLOOMZ, BELLE, ChatGLM and ChatGPT) on the MoZIP benchmark. Experimental results demonstrate that MoZi outperforms BLOOMZ, BELLE and ChatGLM by a noticeable margin, while it had lower scores compared with ChatGPT. Notably, the performance of current LLMs on the MoZIP benchmark has much room for improvement, and even the most powerful ChatGPT does not reach the passing level. Our source code, data, and models are available at \url{https://github.com/AI-for-Science/MoZi}.

References (28)

Authors (12)

Shiwen Ni (34 papers)
Minghuan Tan (15 papers)
Yuelin Bai (13 papers)
Fuqiang Niu (9 papers)
Min Yang (239 papers)
Bowen Zhang (161 papers)
Ruifeng Xu (66 papers)
Xiaojun Chen (100 papers)
Chengming Li (28 papers)
Xiping Hu (46 papers)
Ye Li (155 papers)
Jianping Fan (51 papers)

Citations (6)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property (2402.16389v1)

Summary

Related Papers