Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bilingual Terminology Extraction from Comparable E-Commerce Corpora (2104.07398v2)

Published 15 Apr 2021 in cs.CL

Abstract: Bilingual terminologies are important machine translation resources in the field of e-commerce, which are usually either manually translated or automatically extracted from parallel data. The human translation is costly and e-commerce parallel corpora is very scarce. However, the comparable data in different languages in the same commodity field is abundant. In this paper, we propose a novel framework of extracting e-commercial bilingual terminologies from comparable data. Benefiting from the cross-lingual pre-training in e-commerce, our framework can make full use of the deep semantic relationship between source-side terminology and target-side sentence to extract corresponding target terminology. Experimental results on various language pairs show that our approaches achieve significantly better performance than various strong baselines.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Hao Jia (55 papers)
  2. Shuqin Gu (2 papers)
  3. Yuqi Zhang (54 papers)
  4. Xiangyu Duan (10 papers)
Citations (1)