Cortex AISQL: AI-Powered SQL Engine
- Cortex AISQL is an SQL engine that seamlessly integrates AI-driven semantic operators into declarative queries for advanced inference tasks.
- The architecture extends Snowflake by embedding LLM and foundation model calls into query plans for tasks like classification, summarization, and content-based joins.
- Adaptive model cascades and semantic join rewrites optimize cost and latency, achieving up to 30× runtime improvements and enhanced predictive quality.
Cortex AISQL is a production SQL engine embedding native semantic operations, notably LLM and foundation model inference, directly into the declarative SQL query paradigm. Designed and deployed within Snowflake’s cloud data platform, Cortex AISQL enables users to issue queries that combine canonical relational operators with semantic tasks such as classification, completion, summarization, and content-based joins, thereby facilitating unified access to both structured and unstructured data at scale. The integration of LLM-driven operators as core query plan primitives brings unprecedented expressivity but introduces fundamental challenges around cost, latency, selectivity, and runtime optimization, distinct from those in traditional relational engines.
1. System Architecture
Cortex AISQL extends Snowflake’s multi-tiered architecture (separating storage, compute, and Cloud Services) via the addition of the Cortex Platform for LLM inference.
- Query Lifecycle: User-written SQL is parsed and compiled in the Cloud Services control plane, with the AISQL Optimizer injecting new AI operator primitives (AI_FILTER, AI_JOIN, AI_CLASSIFY, AI_COMPLETE, AI_AGG, AI_SUMMARIZE_AGG) into the execution plan.
- Virtual Warehouses: Parallel compute clusters execute these plans, emitting AI operator calls to the Cortex Platform during runtime, instead of solely relying on relational primitives.
- Cortex Platform: The API Service (fronting both REST and batch SQL-driven inference) passes requests to a Scheduler, which dispatches work to GPU-backed Inference Engines (e.g., using vLLM for LLMs and multimodal models).
- Routing Logic: The Scheduler optimizes GPU utilization and model loading locality by routing requests for the same model to already-initialized Inference Engines.
The system design enables concurrent relational and semantic execution using shared infrastructure resources, delegating all LLM computation to the Cortex layer. This modularity supports rapid deployment of new model types and data modalities (text, image, audio).
2. AI-Aware Query Optimization
Semantic operator invocations (LLM or foundation model calls) are significantly more costly and unpredictable in latency than standard SQL predicates. AISQL addresses this by elevating AI inference cost as a first-class optimization objective in the query planner.
- LLM Cost Model: Each AI operator (modeled as a black-box UDF) has cost
where is the number of input rows/tokens, is per-row/token GPU cost, and is the fixed per-call overhead. Total LLM plan cost is
for plan with operators .
- Cost-Aware Planning: The optimizer estimates operator selectivities and costs at compile time, weighing the aggregate AI cost () alongside CPU and I/O resource consumption. Operator reordering (e.g., filter pushdown/pullup with respect to joins), is chosen to minimize weighted total cost:
1 2 3 4 5 6
for each plan P in candidate_plans: estimate N_i for each operator i in P estimate C_LLM(P) = sum(N_i * c_i) estimate C_CPUIO(P) total_cost(P) = w1 * C_LLM(P) + w2 * C_CPUIO(P) choose P* = argmin_P(total_cost(P))
- Empirical Results: In benchmarks (e.g., filtering 1,000 NYT article titles), selecting the optimal predicate order yields 2–8× speedups over naive strategies. For example, where an IN-predicate’s selectivity is low, evaluating AI_FILTER last (after other filters) can achieve ≈7× cost reduction compared to evaluating it first.
This framework ensures that plan enumeration directly accounts for the non-relational cost characteristics and performance bottlenecks of LLM inference.
3. Adaptive Model Cascades
To mitigate the high inference cost of large, high-quality models, AISQL implements adaptive cascades that combine fast, smaller proxy models with large oracle models:
- Proxy/Oracle Composition: A low-latency LLM (e.g., Llama3.1-8B) acts as the proxy, and a higher-accuracy, expensive LLM (e.g., Llama3.3-70B) is the oracle.
- Confidence Thresholding: Proxy output scores are thresholded at and .
- : output NEG, do not escalate.
- : output POS, do not escalate.
- : uncertainty, escalate to oracle.
- Threshold Determination: A small, importance-sampled subset (rows with close to 0.5) is labeled by the oracle. Thresholds are set so that the proportion of misclassified records per class is ≤ predetermined risk (e.g., using Clopper–Pearson bounds).
- Cascaded Algorithm:
- Proxy inference generates for all entries in batch .
- Importance-sample ().
- Oracle labels .
- Estimate thresholds , .
- Classify all and escalate uncertain entries until budget exhausted.
Performance: On Boolean classification benchmarks, cascades achieve 2–6× speedups while maintaining 90–95% of oracle quality. For example, on seven datasets:
- Oracle-only: 975.9 s, F1=0.812
- Proxy-only: 296.2 s, F1=0.659 (–18.8% F1)
- Cascade: 336.4 s, F1=0.777 (–4.3% F1)
This demonstrates that most entries can be confidently and cheaply handled by the proxy, with only uncertain cases incurring high oracle cost.
4. Semantic Join Query Rewriting
The semantic join (AI_JOIN) between two tables using LLM-based comparison is naively quadratic in cost, requiring model invocations.
- Multi-Label Rewriting: Instead of pairwise filters,
- Aggregate all candidate labels from table into an array.
- For each row in , call AI_CLASSIFY(.text, [labels]), returning the subset of matching labels.
- Reconstruct join pairs as for each matched label.
SQL Example:
- Quadratic:
1 2 3 4
SELECT * FROM Reviews AS A JOIN Categories AS B ON AI_FILTER(PROMPT('Review {0} ∈ category {1}', A.review, B.label));
- Linear:
1 2 3 4 5 6 7 8 9
WITH all_labels AS (SELECT ARRAY_AGG(label) AS lbls FROM Categories) SELECT A.*, EXPLODE( AI_CLASSIFY( A.review, (SELECT lbls FROM all_labels), 'Map review to one or more categories.' ) ) AS matched_label FROM Reviews AS A;
Complexity: Rewriting reduces LLM calls from to .
Empirics: In evaluation on eight tasks, the average runtime drops from 2330 s (quadratic join) to 41 s (rewritten; ≈30× improvement). F1 score rises from 0.412 to 0.596 (+44.7%). Per-dataset speedups range 15.2–69.5×. Often, multi-label classification also improves join quality due to superior context-aware reasoning in the LLM prompt.
5. Production Deployment and Representative Use Cases
Cortex AISQL is deployed at-scale in Snowflake, serving multi-tenant workloads within the existing warehouse and storage separation paradigm.
- Operational Observations:
- For July–September 2025, AI operators constitute >80% of total AISQL execution cost.
- Multi-table queries are disproportionately expensive: while 40% of queries span ≥2 tables, these account for 58% of total execution time.
- Prominent Workloads:
- Analytics: semantic grouping and summarization of customer support transcripts.
- Search: AI_FILTER on product descriptions; contextual joins with recommendation tables.
- Content Understanding: AI_AGG and AI_SUMMARIZE_AGG applied to user reviews, technical documents, and social media content.
These production metrics confirm the practical scalability and operational cost profile of semantic SQL at enterprise scale.
6. Limitations and Future Directions
Several challenges remain in optimizing and generalizing semantic SQL processing:
- Cost and Selectivity Estimation: Compile-time selectivity estimations for LLM-based predicates remain imprecise, especially for skewed or highly variable data. This impacts the effectiveness of both operator placement and resource provisioning.
- Re-Optimization: There is a need for dynamic query plan adaptation based on observed mid-query selectivity, potentially via feedback loops.
- Extension of Cascade Logic: Current cascades are primarily for Boolean predicates (AI_FILTER); generalizing to AI_CLASSIFY (multi-label) and AI_COMPLETE remains to be robustly addressed.
- Granular Cost Models: Future systems should resolve per-token vs. per-row billing, accommodate multimodal cost (text vs. image), and reflect model heterogeneity acutely in planning.
- Intermediate Result Caching: Efficient reuse and caching of prior LLM outputs across queries are critical for amortizing inference expense.
- Joint Operator Optimization: Exploring a richer optimization space, including joint AI/traditional operator co-optimization, explicit handling of nested queries, and user-defined AI logic, is a prominent direction.
In summary, Cortex AISQL operationalizes a triad of innovations—AI-aware query planning, adaptive model cascades, and semantic join rewriting—demonstrating that LLM-powered semantic computation can be deeply fused into declarative SQL frameworks while achieving significant gains in both performance and predictive quality over naive approaches (Aggarwal et al., 10 Nov 2025).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free