Choosing labeled examples to best inform assertion selection

Determine which labeled input–output examples for a specific large language model (LLM) pipeline should be selected to best inform the choice among candidate data quality assertions, in order to optimize failure coverage and false failure rate under limited labeling budgets.

Background

spade’s filtering stage relies on a limited set of labeled examples to estimate the false failure rates and coverage of candidate assertions. When labeled data are scarce or unrepresentative, useful assertions can be overlooked, motivating more principled example selection.

The authors explicitly note that deciding which examples to label to best aid assertion selection is an open problem related to active learning.

References

Determining which labeled examples would help best select from the set of assertions is an open question that is reminiscent of active learning.

SPADE: Synthesizing Data Quality Assertions for Large Language Model Pipelines (2401.03038 - Shankar et al., 5 Jan 2024) in Conclusion and Future Work