Multi-Set Convolutional Network (MSCN)

Updated 12 May 2026

MSCN is a deep learning model for cardinality estimation that represents queries as unordered sets of tables, joins, and predicates.
It employs permutation-invariant deep set operations with average pooling and MLPs to efficiently capture correlations and handle sparse, zero-tuple scenarios.
Empirical results show MSCN outperforms traditional sampling and histogram methods, offering a compact, scalable solution for complex relational queries.

The Multi-Set Convolutional Network (MSCN) is a deep learning architecture designed for cardinality estimation in relational databases, as introduced in "Learned Cardinalities: Estimating Correlated Joins with Deep Learning" (Kipf et al., 2018). MSCN represents relational queries as sets of tables, join predicates, and base-table predicates, capturing set semantics via neural permutation invariance. It improves on traditional sampling and histogram-based estimators by addressing challenges such as sparse selectivity (zero-tuple situations), capturing join-crossing correlations, and scalability in footprint and computation.

1. Query Representation as Unordered Sets

MSCN encodes each conjunctive select-join query $q \in Q$ as a triple of sets $(T_q, J_q, P_q)$ :

$T_q \subseteq T$ : the set of base tables involved,
$J_q \subseteq J$ : the set of join predicates (primarily foreign-key = primary-key),
$P_q \subseteq P$ : the set of selection predicates, formatted as (column, operator, value).

Each element is featurized as follows:

Table $t \in T$ : one-hot vector $v_t \in \{0,1\}^{|T|}$ , optionally concatenated with either a scalar $s_t$ (qualifying sample count) or a bitmap $b_t \in \{0,1\}^S$ (bitmask for $S$ materialized samples).
Join $(T_q, J_q, P_q)$ 0: one-hot $(T_q, J_q, P_q)$ 1.
Predicate $(T_q, J_q, P_q)$ 2: one-hot vectors for $(T_q, J_q, P_q)$ 3 and $(T_q, J_q, P_q)$ 4 ( $(T_q, J_q, P_q)$ 5 for $(T_q, J_q, P_q)$ 6, $(T_q, J_q, P_q)$ 7, $(T_q, J_q, P_q)$ 8), plus a normalized $(T_q, J_q, P_q)$ 9, derived by linear scaling within column min-max.

This representation supports query featurizations such as:

Tables: $T_q \subseteq T$ 0 each as one-hot.
Joins: $T_q \subseteq T$ 1 as one-hot.
Predicates: $T_q \subseteq T$ 2 as one-hot column, one-hot operator $T_q \subseteq T$ 3, $T_q \subseteq T$ 4.

2. Multi-Set Convolutional Operator

MSCN employs the Deep Sets theorem, representing a permutation-invariant function $T_q \subseteq T$ 5 on set $T_q \subseteq T$ 6 as $T_q \subseteq T$ 7. For each query:

$T_q \subseteq T$ 8
$T_q \subseteq T$ 9
$J_q \subseteq J$ 0

Here, $J_q \subseteq J$ 1 denotes a two-layer fully connected network with ReLU activation. Average pooling ensures stable embedding magnitude regardless of input cardinality. When using sampling bitmaps, $J_q \subseteq J$ 2 includes the one-hot table ID concatenated with $J_q \subseteq J$ 3 or $J_q \subseteq J$ 4.

3. Architecture and Cardinality Regression

The pooled embeddings $J_q \subseteq J$ 5, $J_q \subseteq J$ 6, and $J_q \subseteq J$ 7 (each $J_q \subseteq J$ 8 with $J_q \subseteq J$ 9) are concatenated:

$P_q \subseteq P$ 0

This vector $P_q \subseteq P$ 1 is processed by a final output MLP ( $P_q \subseteq P$ 2) with a sigmoid activation in the last layer. The output $P_q \subseteq P$ 3 is interpreted as a normalized log-cardinality estimate. Normalization is performed by scaling the ground-truth log-cardinality into $P_q \subseteq P$ 4 range over the training dataset:

$P_q \subseteq P$ 5

At inference, normalization is inverted to recover the actual predicted cardinality.

4. Training Objective and Optimization

The network is trained to minimize the mean $P_q \subseteq P$ 6-error over a query set $P_q \subseteq P$ 7:

$P_q \subseteq P$ 8

$P_q \subseteq P$ 9

where $t \in T$ 0 is the de-normalized estimate and $t \in T$ 1 is the true cardinality. The mean $t \in T$ 2-error directly reflects the multiplicative error metric relevant for optimizers. Comparative experiments with MSE and geometric mean $t \in T$ 3-error showed the mean $t \in T$ 4-error delivered superior empirical results. Training uses the Adam optimizer (learning rate 0.001).

Key hyperparameters:

Hidden dimension $t \in T$ 5,
Each MLP uses two layers,
Batch size 1024, trained for 100 epochs,
Materialized sample size $t \in T$ 6 tuples per table,
Training/validation split of 90,000 / 10,000 queries.

5. Integration of Sampling-Based Information

To leverage sampling strengths while mitigating zero-tuple failure modes, MSCN augments the table feature vector $t \in T$ 7 with either:

The scalar $t \in T$ 8 (number of qualifying samples among $t \in T$ 9 pre-materialized tuples),
The full $v_t \in \{0,1\}^{|T|}$ 0-bit bitmap $v_t \in \{0,1\}^{|T|}$ 1 (records which tuples satisfy predicates).

When $v_t \in \{0,1\}^{|T|}$ 2 ("0-tuple"), conventional sampling must revert to independence assumptions or uniformity heuristics, often resulting in large errors. MSCN's architecture, retaining set structure and historical exposure to zero-sample patterns, allows it to learn correlations (e.g., high selectivity conjunctions) unavailable to classical sampling alone.

6. Capturing Join-Crossing Correlations

Traditional histograms and sampling-based approaches generally assume independence across joins or have limited ability to handle absence of join-qualifying samples. MSCN's parallel aggregation of tables, joins, and predicates allows the final output MLP to integrate cross-signals and detect join-crossing correlations. For example, the model can associate patterns such as "French actors (Person.nationality=FR) are disproportionately in romantic movies," a dependency structure that is inaccessible to attribute-wise statistics and conventional index probing. This architecture enables the network to address limitations that "cripple all traditional estimators," particularly in multi-table, correlated settings.

7. Empirical Performance, Limitations, and Future Directions

Empirical benchmarks on the IMDb dataset (2.5M movies, 4M actors) included synthetic (5,000 queries, 0–2 joins), scale (500 queries, generalization to 3–4 joins), and JOB-light (70 real-world queries). On the synthetic workload using bitmaps:

Median $v_t \in \{0,1\}^{|T|}$ 3-error: 1.18 (compared to 1.69 for PostgreSQL, 1.89 for RS, 1.09 for IBJS),
$v_t \in \{0,1\}^{|T|}$ 4 percentile $v_t \in \{0,1\}^{|T|}$ 5-error: 6.84 (vs. 23.9 / 53.4 / 33.2),
Mean $v_t \in \{0,1\}^{|T|}$ 6-error: 2.89 (vs. 154 / 125 / 118).

On highly selective "0-tuple" queries (376 queries), MSCN median $v_t \in \{0,1\}^{|T|}$ 7-error: 2.94, outperforming RS (9.13) and PostgreSQL (4.78). For scalability, MSCN's $v_t \in \{0,1\}^{|T|}$ 8 percentile $v_t \in \{0,1\}^{|T|}$ 9-error scaled from 38.6 (3 joins) to 2397 (4 joins), still outperforming PostgreSQL. For JOB-light queries, median $s_t$ 0-error was 3.82 (Postgres: 7.93, RS: 11.5, IBJS: 1.59), $s_t$ 1 percentile 362 (vs. Postgres: 1104, RS: 4073).

Model footprint remains under 3 MiB, a marked efficiency compared to full indexes as used by IBJS.

Limitations include:

Diminished generalization for queries diverging from the training join and predicate distributions,
Lack of support for complex predicates (e.g., LIKE, disjunctions), which require fallback to bitmap or histogram features,
Point-estimate predictions only (i.e., absence of uncertainty quantification),
Static snapshot assumption, so adapting to schema/data changes requires retraining or risk-prone online updates,
Restriction to numeric/string attribute encoding via hashing, needing enriched supervision.

Potential directions cited include uncertainty estimation via deep ensembles or dropout, adaptive query generation targeting high-error schema regions, and extending predicate-level bitmap support.

MSCN by design leverages set invariance and sampling signals in a compact neural model. This permits reductions in both central and tail-tier cardinality estimation errors, robustness to zero-sample regimes, and recognition of complex data correlations, positioning MSCN as a first successful step towards learned, memory-efficient cardinality estimation in relational database management systems (Kipf et al., 2018).

Markdown Report Issue Upgrade to Chat

References (1)

Learned Cardinalities: Estimating Correlated Joins with Deep Learning (2018)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multi-Set Convolutional Network (MSCN).

Multi-Set Convolutional Network (MSCN)

1. Query Representation as Unordered Sets

2. Multi-Set Convolutional Operator

3. Architecture and Cardinality Regression

4. Training Objective and Optimization

5. Integration of Sampling-Based Information

6. Capturing Join-Crossing Correlations

7. Empirical Performance, Limitations, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Multi-Set Convolutional Network (MSCN)

1. Query Representation as Unordered Sets

2. Multi-Set Convolutional Operator

3. Architecture and Cardinality Regression

4. Training Objective and Optimization

5. Integration of Sampling-Based Information

6. Capturing Join-Crossing Correlations

7. Empirical Performance, Limitations, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research