Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Revisiting Prompt Engineering via Declarative Crowdsourcing (2308.03854v1)

Published 7 Aug 2023 in cs.DB, cs.AI, cs.HC, and cs.LG

Abstract: LLMs are incredibly powerful at comprehending and generating data in the form of text, but are brittle and error-prone. There has been an advent of toolkits and recipes centered around so-called prompt engineering-the process of asking an LLM to do something via a series of prompts. However, for LLM-powered data processing workflows, in particular, optimizing for quality, while keeping cost bounded, is a tedious, manual process. We put forth a vision for declarative prompt engineering. We view LLMs like crowd workers and leverage ideas from the declarative crowdsourcing literature-including leveraging multiple prompting strategies, ensuring internal consistency, and exploring hybrid-LLM-non-LLM approaches-to make prompt engineering a more principled process. Preliminary case studies on sorting, entity resolution, and imputation demonstrate the promise of our approach

Revisiting Prompt Engineering via Declarative Crowdsourcing

In the paper entitled "Revisiting Prompt Engineering via Declarative Crowdsourcing," Parameswaran et al. propose a novel approach for leveraging LLMs in the design of efficient and accurate data processing workflows. Originating from UC Berkeley, the authors view prompt engineering through the lens of declarative crowdsourcing, borrowing concepts from the established literature to manage the inherent brittleness and error-proneness of LLMs.

Key Ideas and Approaches

The central thesis of the paper is to treat LLMs analogous to "noisy human oracles," drawing a parallel between LLM-induced errors and those made by human crowd workers. This analogy is leveraged to explore and improve LLM prompt engineering by adopting strategies from crowdsourcing literature. The paper introduces several principles to enhance the consistency, accuracy, and efficiency of data-driven workflows using LLMs.

  1. Diverse Prompting Strategies: Similar to crowdsourcing, where multiple strategies for the same task yield varied outcomes, prompt engineering with LLMs benefits from exploring coarse-to-fine and hybrid strategies. The authors demonstrate this through case studies involving sorting tasks, where fine-grained, pairwise comparisons enhanced accuracy over simple, one-shot prompts.
  2. Hybrid Prompt Engineering: The hybrid model combines coarse initial processing with fine-grained secondary prompts, thereby efficiently narrowing down problems. This is particularly suitable for handling larger datasets where LLMs may struggle due to context length limitations.
  3. Internal Consistency: Exploiting task intrinsic properties, like transitivity in entity resolution, allows for consistency checks which improve LLM outputs. When predictions by LLMs violate known consistency rules, the secondary verification or complementary methods can help rectify errors.
  4. Integration with Non-LLM Approaches: To mitigate costs, the paper describes using cheaper models, such as embedding-based techniques or simpler machine learning models, to perform pre-processing or to handle less uncertain cases. This approach reduces unnecessary LLM invocations be reserved for more ambiguous scenarios.

Empirical Validation and Results

The paper supports its declarative approach with empirical evidence. For instance, in sorting tasks, a strategy that combines pairwise comparisons with LLMs showed an improvement in Kendall Tau-β\beta values compared to a single-prompt ordering. Example-driven prompts in data imputation illustrated higher accuracy and efficiency when integrating non-LLM proxies, confirming that LLM-generated solutions were enhanced by smaller, adaptive queries that address specific decision-making junctures in task workflows.

Implications and Future Research

A salient implication of this work is the potential scalability and reliability of LLMs in complex data processing workflows, driven by systematic prompt crafting. By merging the robustness of declarative crowdsourcing with the language processing capabilities of LLMs, the authors provide an adaptable framework that could lead to more sophisticated use-cases in practice.

Future research could expand on this foundational work by integrating sophisticated feedback mechanisms inherent in machine learning, such as active learning, to dynamically inform when and where LLMs should be employed versus cheaper, static models. Additionally, a more comprehensive understanding of how prompt modifications affect LLM behavior is needed to minimize brittleness. Overall, this research marks a pertinent step toward more reliable artificial intelligence applications, blending human-like intuition with machine precision.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Aditya G. Parameswaran (18 papers)
  2. Shreya Shankar (19 papers)
  3. Parth Asawa (5 papers)
  4. Naman Jain (34 papers)
  5. Yujie Wang (103 papers)
Citations (16)
X Twitter Logo Streamline Icon: https://streamlinehq.com