Papers
Topics
Authors
Recent
2000 character limit reached

Compositional Semantic Parsing with Large Language Models (2209.15003v2)

Published 29 Sep 2022 in cs.CL and cs.AI

Abstract: Humans can reason compositionally when presented with new tasks. Previous research shows that appropriate prompting techniques enable LLMs to solve artificial compositional generalization tasks such as SCAN. In this work, we identify additional challenges in more realistic semantic parsing tasks with larger vocabulary and refine these prompting techniques to address them. Our best method is based on least-to-most prompting: it decomposes the problem using prompting-based syntactic parsing, then uses this decomposition to select appropriate exemplars and to sequentially generate the semantic parse. This method allows us to set a new state of the art for CFQ while requiring only 1% of the training data used by traditional approaches. Due to the general nature of our approach, we expect similar efforts will lead to new results in other tasks and domains, especially for knowledge-intensive applications.

Citations (87)

Summary

  • The paper introduces least-to-most prompting that decomposes complex semantic parsing tasks into manageable subproblems.
  • It leverages dynamic exemplar selection to achieve state-of-the-art accuracies of 95% on CFQ and 99.2% on COGS using only 1% of training data.
  • The approach significantly enhances compositional generalization in LLMs, promising applications in legal document analysis and complex query interpretation.

Compositional Semantic Parsing with LLMs

Introduction

The paper "Compositional Semantic Parsing with LLMs" (2209.15003) addresses the challenge of compositional generalization in semantic parsing by leveraging LLMs. Compositionality allows humans to understand infinite novel combinations of known components, a skill that standard neural models struggle with. The authors refine prompting techniques to enable LLMs to perform better on realistic semantic parsing tasks, particularly with a larger vocabulary and more complex grammars compared to previously simplified benchmarks like SCAN.

Methodology

Least-to-Most Prompting

The authors propose least-to-most prompting, a strategy that decomposes complex problems into sequences of simpler subproblems that can be solved incrementally:

  • Decomposition: The input is parsed syntactically to break it down into granular subproblems.
  • Dynamic Exemplar Selection: An exemplar pool provides context for solving subproblems by dynamically selecting examples that match the decomposition.
  • Sequential Solution Generation: Intermediate solutions are sequentially generated and aggregated to form the final output. Figure 1

    Figure 1: Syntactic parse of a CFQ input and its decomposition into subproblems. Like the input, the subproblems are well-formed sentences.

Implementation Steps

  1. Syntactic Parsing: Using LLMs to predict phrase and clause structures that allow input decomposition. This step is crucial for enabling least-to-most prompting to handle natural language's syntactic complexity.
  2. Exemplar Pool Construction: By sampling around 1% of available data, a relevant subset is curated, ensuring diversity and coverage across potential decomposed structures.
  3. Subproblem Solution: By leveraging the decomposition, LLMs predict sequential solutions, supported by exemplars, yielding the final parse through an iterative, context-aware resolution. Figure 2

    Figure 2: Prompt designs for semantic parsing. Chain-of-thought (left) generates intermediate steps before the final output. Dynamic least-to-most (right) first sequentially predicts solutions to subproblems before generating the final output.

Results

The approach achieves significant improvements on both the CFQ and COGS benchmarks:

  • CFQ Benchmark: Achieved a new state-of-the-art accuracy of 95% using only 1% of the training data compared to traditional methods. The method reduces the error rate by about 45% over previous best results.
  • COGS Benchmark: Dynamic least-to-most prompting scores 99.2% accuracy, reinforcing the general utility of the method for compositional generalization tasks. Figure 3

    Figure 3: Accuracy on COGS generalization set. The COGS data is not SQL-like, and has a more diverse lexicon compared with CFQ.

Discussion

The dynamic least-to-most prompting method demonstrates that LLMs can excel at compositional tasks with minimal data by breaking down complex input and leveraging structured exemplar-driven prompting. This approach affirms that LLMs, when guided appropriately, can surpass specialized models that require extensive training.

The implications for real-world applications are significant, particularly for tasks demanding high precision in language parsing, such as legal document analysis or complex query interpretation. Future directions might explore integrating this approach with other domains requiring systematic generalization beyond language processing.

Conclusion

The paper successfully extends the applicability of LLMs for realistic semantic parsing tasks by introducing a hybrid approach that combines syntactic parsing with strategic exemplar selection. This advance demonstrates substantial gains in both performance and data efficiency, paving the way for broader deployments of LLMs in tasks necessitating compositional generalization.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 12 likes about this paper.