WWW2025

Compress and Mix: Advancing Efficient Taxonomy Completion with Large Language Models

Hongyuan Xu, Yuhang Niu, Yanlong Wen, Xiaojie Yuan

6 citations

Abstract

Taxonomy completion aims to integrate new concepts into existing taxonomies by determining their appropriate hypernym and hyponym. While semantic and structural information are crucial for this task, existing approaches often struggle to balance these aspects effectively. In this paper, we propose COMI, an efficient taxonomy completion framework that leverages large language models (LLMs) to capture both semantic and structural information in a unified manner. COMI compresses node semantics into token representations, enabling LLMs to efficiently process the input structure composed of these tokens. To enhance the model's understanding of the structure, a further fine-tuning process using contrastive learning with mixup data augmentation is applied, where mixup generates diverse and challenging negative samples. Through these innovations, COMI improves the integration of semantic and structural information, leading to more accurate taxonomy completion. The experimental results on three real-world datasets demonstrate that COMI achieves state-of-the-art performance while showing up to 284x faster inference compared to the previous best method. Our code and compressed tokens are available at https://github.com/cyclexu/COMI.