KDD2024

Scalable Graph Learning for your Enterprise

Hema Raghavan

Abstract

Much of the world's most valued data is stored in relational databases and data warehouses, where the data is organized into many tables connected by primary-foreign key relations. However, building machine learning models using this data is both challenging and time consuming. The core problem is that no machine learning method is capable of learning on multiple tables interconnected by primary-foreign key relations. Current methods can only learn from a single table, so the data must first be manually joined and aggregated into a single training table, the process known as feature engineering. Feature engineering is slow, error prone and leads to suboptimal models. At Kumo.ai we have worked with researchers worldwide to develop an end-to-end deep representation learning approach to directly learn on data laid out across multiple tables [1]. We name our approach Relational Deep Learning (RDL). The core idea is to view relational databases as a temporal, heterogeneous graph, with a node for each row in each table, and edges specified by primary-foreign key links. Message Passing Graph Neural Networks can then automatically learn across the graph to extract representations that leverage all input data, without any manual feature engineering. Our relational deep learning method to encode graph structure into low-dimensional embeddings brings several benefits: (1) automatic learning from the entire data spread across multiple relational tables (2) no manual feature engineering as the system learns optimal embeddings for a target problem; (3) state-of-the-art predictive performance.