Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs (2404.18209v1)

Published 28 Apr 2024 in cs.LG and cs.DB

Abstract: Although RDBs store vast amounts of rich, informative data spread across interconnected tables, the progress of predictive machine learning models as applied to such tasks arguably falls well behind advances in other domains such as computer vision or natural language processing. This deficit stems, at least in part, from the lack of established/public RDB benchmarks as needed for training and evaluation purposes. As a result, related model development thus far often defaults to tabular approaches trained on ubiquitous single-table benchmarks, or on the relational side, graph-based alternatives such as GNNs applied to a completely different set of graph datasets devoid of tabular characteristics. To more precisely target RDBs lying at the nexus of these two complementary regimes, we explore a broad class of baseline models predicated on: (i) converting multi-table datasets into graphs using various strategies equipped with efficient subsampling, while preserving tabular characteristics; and (ii) trainable models with well-matched inductive biases that output predictions based on these input subgraphs. Then, to address the dearth of suitable public benchmarks and reduce siloed comparisons, we assemble a diverse collection of (i) large-scale RDB datasets and (ii) coincident predictive tasks. From a delivery standpoint, we operationalize the above four dimensions (4D) of exploration within a unified, scalable open-source toolbox called 4DBInfer. We conclude by presenting evaluations using 4DBInfer, the results of which highlight the importance of considering each such dimension in the design of RDB predictive models, as well as the limitations of more naive approaches such as simply joining adjacent tables. Our source code is released at https://github.com/awslabs/multi-table-benchmark .

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (20)
  1. Minjie Wang (35 papers)
  2. Quan Gan (31 papers)
  3. David Wipf (59 papers)
  4. Zhenkun Cai (8 papers)
  5. Ning Li (174 papers)
  6. Jianheng Tang (31 papers)
  7. Yanlin Zhang (5 papers)
  8. Zizhao Zhang (44 papers)
  9. Zunyao Mao (1 paper)
  10. Yakun Song (9 papers)
  11. Yanbo Wang (54 papers)
  12. Jiahang Li (19 papers)
  13. Han Zhang (338 papers)
  14. Guang Yang (422 papers)
  15. Xiao Qin (15 papers)
  16. Chuan Lei (16 papers)
  17. Muhan Zhang (89 papers)
  18. Weinan Zhang (322 papers)
  19. Christos Faloutsos (88 papers)
  20. Zheng Zhang (488 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.