EMNLP2025
Few-Shot Open-Set Classification via Reasoning-Aware Decomposition
Avyav Kumar Singh, Helen Yannakoudakis
摘要
Large language models (LLMs) excel at fewshot learning, but their ability to reject out-ofdistribution examples remains under-explored. We study this challenge under the setting of few-shot open-set classification, where a model must not only classify examples from a small set of seen classes but also reject unseen ones at inference time. This setting is more realistic and challenging than traditional closedset supervised learning, requiring both finegrained classification and robust rejection. We show that, for small LLMs, neither chain-ofthought (CoT) prompting nor supervised finetuning (SFT) alone are sufficient to generalise reliably, particularly when class semantics are anonymised. We introduce Wasserstein GFN (W-GFN), a novel amortised Generative Flow Network framework that uses latent trajectories to approximate the Bayesian posterior. With as few as 4 examples per class, W-GFN substantially improves performance, enabling Llama 3.2 3B to achieve up to ≥ 80% of the performance of Llama 3.3 70B in complex datasets, despite being ∼ 23 times smaller, which highlights the importance of reasoning-aware approaches for robust open-set few-shot learning. 1 This is a lenient evaluation setting: any incorrect prediction to an out-of-set label is accepted as 'None of these', artificially inflating unseen F1. We also prompt the model that some examples may be out-of-set (Appendix A.1). Thus, these results represent an upper bound on unseen performance. 2 Anonymised labels are a common practice in prior work (e.g., encoder-based classification and meta-learning; Liu et al. (2020); Snell et al. (2017); Bansal et al. (2020a)). overfitting, especially when Y lacks semantic cues, by promoting abstraction (i.e., intermediate concepts) over memorisation.