ICLR2026

TokenSeek: Memory Efficient Fine Tuning via Instance-Aware Token Ditching

Runjia Zeng, Qifan Wang, Qiang Guan, Ruixiang Tang, Lifu Huang, Zhenting Wang, XUELING ZHANG, Cheng Han, Dongfang Liu

摘要

Fine-tuning has been regarded as a de facto approach for adapting large language models (LLMs) to downstream tasks. However, the high training memory consumption inherited from LLMs makes this process generally inefficient. Among existing memory efficient approaches, activation-related optimization has proven particularly effective, as activations consistently dominate overall memory consumption. Although prior arts offer various activation optimization strategies, they typically adopt a uniform yet inflexible strategy across all instance. This data-agnostic nature ultimately results in ineffective and unstable fine tuning. To solve this problem, we propose TOKENSEEK, a universal plugin solution that is suitable for various Transformer-based models through instance-aware token seeking and ditching. TO-KENSEEK achieves significant fine-tuning memory savings (e.g., requiring only 2.8 GB, 14.8% of the original memory on Llama3.2 1B) with on-par or even superior performance. Furthermore, our interpretable token seeking process reveals the underlying factors behind its effectiveness, offering valuable insights for future research on token efficiency fine-tuning. Homepage: runjia.tech/iclr_tokenseek.