KDD2025

Range-limited Augmentation for Few-shot Learning in Tabular Data with Comprehensive Benchmark

Kyungeun Lee, Moonjung Eo, Hye-Seung Cho, Min-Kook Suh, Seoyoon Kim, Ye Seul Sim, Suhee Yoon, Sanghyu Yoon, Woohyung Lim

被引用 1 次

摘要

Few-shot learning is crucial for tabular data, where the high cost of annotation often limits the availability of labeled samples. Despite its importance in real-world applications such as healthcare and finance, few-shot learning in tabular domains has received limited attention. To address this, we introduce range-limited augmentation, a novel augmentation strategy for contrastive learning that perturbs numerical features within predefined feature-specific ranges. Unlike conventional augmentations, which may result in false positive pairs during contrastive learning, our approach ensures semantic consistency by restricting augmentations to ranges. A quantitative analysis confirms that range-limited augmentation better preserves task-relevant information compared to existing augmentation techniques. Additionally, we propose FeSTa (Few-Shot Tabular classification benchmark), the first large-scale benchmark designed to systematically evaluate few-shot learning methods in tabular data. FeSTa includes 50 datasets and 32 algorithms spanning supervised, unsupervised, self-supervised, semi-supervised, and foundation models. Experiments on FeSTa show that range-limited augmentation consistently ranks among the top methods, achieving an average rank of 2.6 out of 32 in 1-shot classification, despite not relying on large-scale pretraining or complex architectures. The benchmark code is available in https://github.com/kyungeun-lee/festa.git.