Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

To Be or Not To Be a Verbal Multiword Expression: A Quest for Discriminating Features (2007.11381v1)

Published 22 Jul 2020 in cs.CL

Abstract: Automatic identification of mutiword expressions (MWEs) is a pre-requisite for semantically-oriented downstream applications. This task is challenging because MWEs, especially verbal ones (VMWEs), exhibit surface variability. However, this variability is usually more restricted than in regular (non-VMWE) constructions, which leads to various variability profiles. We use this fact to determine the optimal set of features which could be used in a supervised classification setting to solve a subproblem of VMWE identification: the identification of occurrences of previously seen VMWEs. Surprisingly, a simple custom frequency-based feature selection method proves more efficient than other standard methods such as Chi-squared test, information gain or decision trees. An SVM classifier using the optimal set of only 6 features outperforms the best systems from a recent shared task on the French seen data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Caroline Pasquer (1 paper)
  2. Agata Savary (3 papers)
  3. Jean-Yves Antoine (2 papers)
  4. Carlos Ramisch (4 papers)
  5. Nicolas Labroche (3 papers)
  6. Arnaud Giacometti (1 paper)
Citations (1)

Summary

We haven't generated a summary for this paper yet.