Punctuation Restoration Improves Structure Understanding without Supervision (2402.08382v2)

Published 13 Feb 2024 in cs.CL

Abstract: Unsupervised learning objectives like LLMing and de-noising constitute a significant part in producing pre-trained models that perform various downstream applications from natural language understanding to conversational tasks. However, despite impressive generative capabilities of recent LLMs, their abilities to capture syntactic or semantic structure within text lag behind. We hypothesize that the mismatch between linguistic performance and competence in machines is attributable to insufficient transfer of linguistic structure knowledge to computational systems with currently popular pre-training objectives. We show that punctuation restoration as a learning objective improves in- and out-of-distribution performance on structure-related tasks like named entity recognition, open information extraction, chunking, and part-of-speech tagging. Punctuation restoration is an effective learning objective that can improve structure understanding and yield a more robust structure-aware representations of natural language.

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Punctuation Restoration Improves Structure Understanding without Supervision (2402.08382v2)

Summary

Related Papers

Tweets