Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Realistic Zero-Shot Cross-Lingual Transfer in Legal Topic Classification (2206.03785v1)

Published 8 Jun 2022 in cs.CL

Abstract: We consider zero-shot cross-lingual transfer in legal topic classification using the recent MultiEURLEX dataset. Since the original dataset contains parallel documents, which is unrealistic for zero-shot cross-lingual transfer, we develop a new version of the dataset without parallel documents. We use it to show that translation-based methods vastly outperform cross-lingual fine-tuning of multilingually pre-trained models, the best previous zero-shot transfer method for MultiEURLEX. We also develop a bilingual teacher-student zero-shot transfer approach, which exploits additional unlabeled documents of the target language and performs better than a model fine-tuned directly on labeled target language documents.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Stratos Xenouleas (2 papers)
  2. Alexia Tsoukara (2 papers)
  3. Giannis Panagiotakis (1 paper)
  4. Ilias Chalkidis (40 papers)
  5. Ion Androutsopoulos (51 papers)
Citations (6)