Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Building a Hebrew Semantic Role Labeling Lexical Resource from Parallel Movie Subtitles (2005.08206v1)

Published 17 May 2020 in cs.CL

Abstract: We present a semantic role labeling resource for Hebrew built semi-automatically through annotation projection from English. This corpus is derived from the multilingual OpenSubtitles dataset and includes short informal sentences, for which reliable linguistic annotations have been computed. We provide a fully annotated version of the data including morphological analysis, dependency syntax and semantic role labeling in both FrameNet and PropBank styles. Sentences are aligned between English and Hebrew, both sides include full annotations and the explicit mapping from the English arguments to the Hebrew ones. We train a neural SRL model on this Hebrew resource exploiting the pre-trained multilingual BERT transformer model, and provide the first available baseline model for Hebrew SRL as a reference point. The code we provide is generic and can be adapted to other languages to bootstrap SRL resources.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Ben Eyal (3 papers)
  2. Michael Elhadad (11 papers)