Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Addressing Resource and Privacy Constraints in Semantic Parsing Through Data Augmentation (2205.08675v1)

Published 18 May 2022 in cs.CL and cs.AI

Abstract: We introduce a novel setup for low-resource task-oriented semantic parsing which incorporates several constraints that may arise in real-world scenarios: (1) lack of similar datasets/models from a related domain, (2) inability to sample useful logical forms directly from a grammar, and (3) privacy requirements for unlabeled natural utterances. Our goal is to improve a low-resource semantic parser using utterances collected through user interactions. In this highly challenging but realistic setting, we investigate data augmentation approaches involving generating a set of structured canonical utterances corresponding to logical forms, before simulating corresponding natural language and filtering the resulting pairs. We find that such approaches are effective despite our restrictive setup: in a low-resource setting on the complex SMCalFlow calendaring dataset (Andreas et al., 2020), we observe 33% relative improvement over a non-data-augmented baseline in top-1 match.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Kevin Yang (45 papers)
  2. Olivia Deng (2 papers)
  3. Charles Chen (8 papers)
  4. Richard Shin (18 papers)
  5. Subhro Roy (11 papers)
  6. Benjamin Van Durme (173 papers)
Citations (10)