KDD2025

Boost the Performance of Tabular Data Models with GPU Accelerated Feature Engineering

Chris Deotte, Ronay Ak

Abstract

Feature engineering remains a crucial technique for improving the performance of models trained on tabular data. Unlike computer vision and natural language processing, where deep learning models automatically extract hierarchical features from raw data, the most accurate tabular models, such as gradient boosted decision trees, still benefit significantly from manually crafted features. This is demonstrated in Team NVIDIA's many first-place data science competition victories.