WWW2026

KMLP: A Scalable Hybrid Architecture for Web-Scale Tabular Data Modeling

Mingming Zhang, Pengfei Shi, Junbo Zhao, Ningtao Wang, Feng Zhao, Guandong Sun, Yulin Kang, Xing Fu, Zhiqing Xiao, Weiqiang Wang, Ruizhe Gao

Abstract

Predictive modeling on web-scale tabular data presents significant scalability challenges for industrial applications, often involving billions of instances and hundreds of heterogeneous numerical features. The inherent complexities of these features—characterized by anisotropy, heavy-tailed distributions, and non-stationarity—not only impose bottlenecks on the training efficiency and scalability of mainstream models like Gradient Boosting Decision Trees (GBDTs), but also compel practitioners into laborious, inefficient, and expert-dependent manual feature engineering. To systematically address this challenge, we introduce KMLP, a novel hybrid deep architecture. KMLP synergistically integrates a shallow Kolmogorov-Arnold Network (KAN) as a front-end with a Gated Multilayer Perceptron (gMLP) as the backbone. The KAN front-end leverages its learnable activation functions to automatically model complex non-linear transformations for each input feature in an end-to-end manner, thereby automating feature representation learning. Subsequently, the gMLP backbone efficiently captures high-order interactions among these refined representations. Extensive experiments on multiple public benchmarks and an ultra-large-scale industrial web dataset with billions of samples demonstrate that KMLP achieves state-of-the-art (SOTA) performance. Crucially, our findings reveal that KMLP's performance advantage over strong baselines like GBDTs becomes more pronounced as the data scale increases. This validates KMLP as a scalable and adaptive deep learning paradigm, offering a promising path forward for modeling large-scale, dynamic web tabular data.