Efficient Learned Query Execution over Text and Tables [Technical Report]

Published 29 Oct 2024 in cs.DB | (2410.22522v1)

Abstract: In this paper, we present ELEET, a novel execution engine that allows one to seamlessly query and process text as a first-class citizen along with tables. To enable such a seamless integration of text and tables, ELEET leverages learned multi-modal operators (MMOps) such as joins and unions that seamlessly combine structured with unstructured textual data. While LLMs (LLM) such as GPT-4 are interesting candidates to enable such learned multimodal operations, we deliberately do not follow this trend to enable MMOps, since it would result in high overhead at query runtime. Instead, to enable MMOps, ELEET comes with a more efficient small LLM (SLM) that is targeted to extract structured data from text. Thanks to our novel architecture and pre-training procedure, the ELEET-model enables high-accuracy extraction with low overheads. In our evaluation, we compare query execution based on ELEET to baselines leveraging LLMs such as GPT-4 and show that ELEET can speed up multi-modal queries over tables and text by up to 575x without sacrificing accuracy.

Abstract PDF HTML Upgrade to Chat

Authors (2)

References (73)

Summary

The paper introduces ELEET, a novel engine that integrates text and table data using learned multi-modal operators.
It replaces autoregressive decoding with a single-pass extractive approach, achieving speeds up to 575 times faster than conventional LLMs.
ELEET extends traditional databases by enabling rapid, accurate multi-modal querying with minimal pre-processing for diverse data types.

Overview of ELEET: Efficient Learned Query Execution Over Text and Tables

The paper "ELEET: Efficient Learned Query Execution over Text and Tables" outlines an innovative execution engine designed to facilitate seamless querying of both textual and tabular data. Traditional relational databases are adept at handling structured tabular data but fall short when it comes to multi-modal data, such as text and images. ELEET addresses this limitation by enabling multi-modal queries that incorporate both structured tables and unstructured text.

Core Contributions and Architecture

ELEET's contribution lies in its use of learned multi-modal operators (MMOps), which include operations like joins and unions that cohesively integrate structured and textual data. The system is underpinned by a small LLM (SLM) tailored to efficiently extract structured data from texts. This model's compactness stands in stark contrast to LLMs like GPT-4, significantly improving efficiency. The ELEET-model's architecture and pre-training enable rapid, high-accuracy data extraction with minimal overhead.

Key to this efficiency is the replacement of computationally expensive autoregressive decoding, prevalent in LLMs, with a single-pass extractive approach. The compact model is pre-trained specifically for table extraction tasks, often demonstrating greater accuracy than larger models. ELEET also allows for leveraging table data for context during extraction, refining output quality and increasing efficiency in scenarios where multiple possible extractable text values exist.

Numerical Results and Evaluation

The paper demonstrates ELEET's substantial improvements in both speed and accuracy when executing multi-modal queries compared to baseline methods such as LLMs, including GPT-4. In evaluations, ELEET achieved execution speeds up to 575 times faster than these larger models, without a reduction in accuracy. This performance is attributed to ELEET's specific optimization for its task, efficient model size, and the unique model architecture focused on extractive over generative methods, which reduces latency significantly.

Implications and Future Directions

Practically, ELEET offers a method to extend the utility of existing databases to handle non-tabular data types efficiently, integrating them into existing workflows with minimal pre-processing or manual intervention for data scientists. Theoretically, the approach sets precedence for leveraging small, task-focused LLMs in domain-specific applications, challenging the dominance of LLMs in contexts where efficiency and resource constraints are critical. Future developments could explore extending ELEET's principles to incorporate other data modalities, such as images, further broadening its applicability.

Moreover, the use of an open pre-training corpus, as introduced by the authors, provides a valuable resource that could be used for further training and evaluation of similar models, advocating for a shared community resource to enhance model robustness.

In conclusion, ELEET represents a targeted, efficient solution for multi-modal data processing in database systems, providing a compelling alternative to resource-intensive LLM approaches. Its success highlights the benefits of specialized, efficient models in data management tasks and lays the groundwork for future exploration and integration of other data modalities in similar frameworks.

Markdown Report Issue