SIGMOD2025
AixelNet: A Pre-trained Model with Table-aware Adaptation for Structured Data Prediction
Liming Wang, Meihui Zhang, Zhaojing Luo
1 citation
Abstract
Structured data prediction supports key applications in healthcare, finance, e-commerce, among others. As structured data becomes increasingly heterogeneous and complex, there is a growing need for scalable models that can generalize across diverse tables. Traditional machine learning and deep learning models for structured data often have task-specific architectures requiring computationally retraining for a new prediction task. Consequently, recent studies explore pre-training for structured data, which requires no additional training for a new task. While pre-training has achieved great success in NLP and CV, the research of pre-training for structured data remains preliminary because structured data has its unique characteristics, e.g., complicated correlations and dependencies between features. Existing studies adopt a fixed pre-trained model architecture and less attention has been paid to table-specific characteristics. This limits their ability to adapt to new prediction tasks. To address this challenge, we propose AixelNet, a pre-trained model for structured data prediction that supports table-aware adaptation. We first design a meta feature extraction module that summarizes table-level characteristics enabling model customization. Instead of relying on a single model, AixelNet adopts a multi-model framework with multiple base predictors to capture diverse feature interaction patterns across tables. Finally, these base predictor models are dynamically assigned to different tables for prediction based on the extracted table meta features via a designed hypernetwork, enabling flexible and table-aware model composition for different tables. To further improve generalization and efficiency, we design regularization methods to encourage balanced predictor model usage and diversified learned representations, as well as design a sparse update strategy for sparsely updating relevant predictor models during pre-training. Extensive experiments on 20 classification and 20 regression tasks on tables confirm AixelNet's effectiveness in improving predictive performance compared to eight state-of-the-art baselines and demonstrate its efficiency.