The paper "Linguistically Conditioned Semantic Textual Similarity" addresses the Semantic Textual Similarity (STS) task, which evaluates how semantically similar two sentences are. Recognizing that existing measures can be ambiguous, the authors delve into Conditional STS (C-STS), which assesses similarity conditioned on specific aspects. They identify numerous issues with the current C-STS dataset, such as annotation errors, ill-defined conditions, and ambiguous task definitions.
The authors undertake a reannotation of the C-STS validation set, revealing an annotator discrepancy in 55% of cases. These discrepancies stem from errors in the original labels, unclear condition definitions, and an overall lack of task clarity. To address these issues, they propose a novel approach by adapting the task to a Question-Answering (QA) framework.
The new methodology involves generating answers based on the conditions, which then helps in training models. This QA framework facilitates an automatic error identification pipeline, achieving over 80% F1 score in identifying annotation errors. This significant improvement underscores the effectiveness of their approach in refining the evaluation process for C-STS.
Furthermore, the paper introduces a new training method that leverages the generated answers, leading to marked enhancements in model performance when compared to existing baselines. The authors also explore conditionality annotation using the typed-feature structure (TFS) of entity types. They demonstrate through examples that TFS can provide a robust linguistic foundation for defining conditions in C-STS data.
In summary, the paper not only highlights critical issues in the current C-STS datasets but also offers innovative solutions to improve dataset quality and model performance. The integration of QA task settings and TFS-based annotations represents a significant advancement in the STS domain.