Emergent Mind

PyTorch Frame: A Modular Framework for Multi-Modal Tabular Learning

Published Mar 31, 2024 in cs.LG , cs.DB , and stat.ML


We present PyTorch Frame, a PyTorch-based framework for deep learning over multi-modal tabular data. PyTorch Frame makes tabular deep learning easy by providing a PyTorch-based data structure to handle complex tabular data, introducing a model abstraction to enable modular implementation of tabular models, and allowing external foundation models to be incorporated to handle complex columns (e.g., LLMs for text columns). We demonstrate the usefulness of PyTorch Frame by implementing diverse tabular models in a modular way, successfully applying these models to complex multi-modal tabular data, and integrating our framework with PyTorch Geometric, a PyTorch library for Graph Neural Networks (GNNs), to perform end-to-end learning over relational databases.
PyTorch Frame's architecture: Tensor Frame materialization, semantic encodings, interaction blocks, readout decoder head.


  • PyTorch Frame is a new framework designed for efficient multi-modal tabular learning, leveraging PyTorch for handling complex tabular data.

  • It introduces the Tensor Frame data structure for effective management of various data types and facilitates seamless data processing and embedding.

  • The framework includes mechanisms for encoding, column-wise interaction, and decoding to capture intricate data relationships and improve prediction tasks.

  • PyTorch Frame integrates with external foundational models and PyTorch Geometric (PyG) for enhanced handling of complex columns and relational data, demonstrating superior performance over traditional models.

Introduction to PyTorch Frame

The recently introduced PyTorch Frame offers an innovative solution for tabular deep learning, addressing the requirements for handling complex, multi-modal tabular data efficiently in deep learning applications. This PyTorch-based framework facilitates easy interaction with tabular data through a newly proposed data structure, Tensor Frame, alongside a modular implementation of diverse tabular models and seamless integration with external foundation models for complex column data processing.

Core Components of PyTorch Frame

Data Materialization

PyTorch Frame introduces Tensor Frame, a PyTorch-friendly data structure capable of effectively managing arbitrary complex columns by grouping column data based on semantic types. This transformation simplifies the handling of different data modalities, including numerical, categorical, multicategorical, timestamp, textual, and embedded types, enabling efficient data processing suitable for machine learning models.

Encoding Process

The encoding stage of PyTorch Frame transforms the materialized data into an embedded representation where each column is independently embedded into a uniform dimensional space. This process includes feature normalization and column-specific embedding techniques, catering to the unique characteristics of each semantic type.

Column-wise Interaction

Following the encoding, PyTorch Frame enacts a column-wise interaction mechanism that iteratively updates the embedding of each column by considering the information from other columns. This procedure enables the capture of intricate inter-column relationships within the tabular data, enriching the representational capacity of the encoded embeddings.

Decoding for Prediction

The final stage entails decoding the enriched column embeddings to generate row-wise embeddings that can be utilized directly for prediction tasks or as input to subsequent deep learning models. This decoding step summarizes the comprehensive interactions among columns, rendering a consolidated representation for each row in the table.

Advantages and Integrations

Integration with Foundational Models

A prominent feature of PyTorch Frame is its capacity to incorporate external foundational models, particularly for complex columns such as texts and images. By leveraging pre-trained models or enabling end-to-end fine-tuning, PyTorch Frame significantly enhances the handling and predictive modeling of multi-modal data.

Compatibility with PyTorch Geometric

PyTorch Frame seamlessly integrates with PyTorch Geometric (PyG) for learning over relational databases. This integration combines the strengths of tabular deep learning and GNNs, enabling end-to-end learning that exploits both tabular and relational data characteristics for improved prediction accuracy.

Empirical Validation

PyTorch Frame has been empirically tested across various datasets to demonstrate its efficacy in multi-modal tabular learning. The framework shows promising results in handling traditional datasets with numerical and categorical features, along with modern datasets containing complex columns and relational structures. Notably, the integration of PyTorch Frame with foundation models and PyG outperforms conventional models like LightGBM, especially in datasets enriched with textual information and relational data.


PyTorch Frame represents a significant advancement in tabular deep learning, offering a comprehensive, efficient, and flexible framework for handling complex multi-modal tabular data. By encapsulating the entire process from data materialization to prediction and enabling the integration with external models and PyG, PyTorch Frame paves the way for innovative applications in fields requiring sophisticated tabular data analysis.

Get summaries of trending AI papers delivered straight to your inbox

Unsubscribe anytime.

  1. Dnf-net: A neural architecture for tabular data. In International Conference on Learning Representations (ICLR)
  2. Optuna: A next-generation hyperparameter optimization framework. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp.  2623–2631
  3. TabNet: Attentive interpretable tabular learning. In AAAI Conference on Artificial Intelligence
  4. Blake, C. L. Uci repository of machine learning databases. http://www. ics. uci. edu/~ mlearn/MLRepository. html

  5. Language models are few-shot learners. volume 33, pp.  1877–1901
  6. ExcelFormer: A Neural Network Surpassing GBDTs on Tabular Data
  7. Learning to simulate complex physics with graph networks. In International Conference on Machine Learning (ICML), 2023b.
  8. XGBoost: A scalable tree boosting system. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp.  785–794
  9. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
  10. Codd, E. F. A relational model of data for large shared data banks. Communications of the ACM, 13(6):377–387
  11. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  12. Fast Graph Representation Learning with PyTorch Geometric
  13. Relational Deep Learning: Graph Representation Learning on Relational Databases
  14. Neural message passing for quantum chemistry. In International Conference on Machine Learning (ICML), pp. 1273–1272
  15. Revisiting deep learning models for tabular data. In Advances in Neural Information Processing Systems (NeurIPS)
  16. On embeddings for numerical features in tabular deep learning. In Advances in Neural Information Processing Systems (NeurIPS)
  17. Tabr: Tabular deep learning meets nearest neighbors. In International Conference on Learning Representations (ICLR)
  18. Why do tree-based models still outperform deep learning on typical tabular data? volume 35, pp.  507–520
  19. Array programming with numpy. Nature, 585(7825):357–362
  20. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  770–778
  21. Open graph benchmark: Datasets for machine learning on graphs. In Advances in Neural Information Processing Systems (NeurIPS)
  22. Learning backward compatible embeddings. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp.  3018–3028
  23. TabTransformer: Tabular Data Modeling Using Contextual Embeddings
  24. Boost then Convolve: Gradient Boosting Meets Graph Neural Networks
  25. Lightgbm: A highly efficient gradient boosting decision tree. volume 30
  26. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR)
  27. RoBERTa: A Robustly Optimized BERT Pretraining Approach
  28. Text and Code Embeddings by Contrastive Pre-Training
  29. WaveNet: A Generative Model for Raw Audio
  30. Neural oblivious decision ensembles for deep learning on tabular data. In International Conference on Learning Representations (ICLR)
  31. Catboost: unbiased boosting with categorical features. volume 31
  32. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
  33. Benchmarking Multimodal AutoML for Tabular Data with Text Fields
  34. Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification
  35. Tabular data: Deep learning is not all you need. Information Fusion, 81:84–90
  36. SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training
  37. Attention Is All You Need
  38. XTab: Cross-table Pretraining for Tabular Transformers

Show All 38

Test Your Knowledge

You answered out of questions correctly.

Well done!