WWW2026

UTAG: Leveraging LLM as a Unified Embedding Generator for Text-Attributed Graphs

Mingqian Ding, Jianjun Li, Zhiyuan Ma, Liwei Zhang, Wenqi Yang

摘要

Large Language Models (LLMs) have shown significant potential in handling text-attributed graphs (TAGs). Nevertheless, existing LLM-enhanced approaches for TAG learning mainly utilize LLMs as text augmenters, which not only incur substantial computational overhead but may also introduce noise. In contrast, their capacity as embedding generators remains underutilized. Furthermore, most current methods are designed for specific TAG instances and lack cross-domain transferability. To overcome these limitations, we propose UTAG, a novel framework that leverages LLMs as unified embedding generators. First, we design a dual-view contrastive fine-tuning approach for cross-domain pretraining. The in-domain view promotes local consistency via adaptive positive sample selection, while the cross-domain view establishes alignment through subspace clustering-based cross-domain prototype aggregation. By jointly pretraining on multiple TAG datasets from diverse domains, we obtain a unified LLM encoder that produces transferable representations. Subsequently, we introduce a graph-specific adaptation training technique that implements two-stage dimensionality reduction with carefully designed semantic and Laplacian reconstruction constraints. This allows effective incorporation of structural signals while preserving semantic information to the greatest extent. Additionally, we employ cross-space alignment to further enhance consistency. Extensive experiments demonstrate that UTAG substantially surpasses existing LLM-enhanced methods, achieving an average improvement of 5.9% on cross-domain transfer tasks, which verifies its effectiveness and superiority.