KDD2025

Seeing the Unseen in Micro-Video Popularity Prediction: Self-Correlation Retrieval for Missing Modality Generation

Zhangtao Cheng, Jian Lang, Ting Zhong, Fan Zhou

被引用 5 次

摘要

Micro-video popularity prediction (MVPP) plays a crucial role in numerous real-world applications, including product marketing and recommendation systems. While existing methodologies predominantly assume complete modalities during multimodal learning, this assumption often fails to hold in practical scenarios due to various constraints, such as privacy concerns or data integrity issues. To address this limitation, we propose SCRAG, a novel Self-Correlation Retrieval-Augmented Generative framework designed to enhance missing-modality robustness in MVPP. SCRAG operates in a retrieval-guided generation manner that explores relevant knowledge to enhance the reconstruction of missing content, which consists of two primary components: (1) a self-correlation retriever and (2) a multimodal mixture-of-experts generator. It first acquires instances pertinent to the missing content through multimodal prompt alignment. Subsequently, the generator extracts contextual modal information from the retrieved context-rich instances. By learning the joint distribution of modalities, SCRAG effectively recovers missing content and addresses the modal heterogeneity challenge inherent in cross-modal generation approaches. Extensive experiments conducted on three real-world datasets demonstrate that SCRAG consistently outperforms state-of-the-art baselines, underscoring its effectiveness in handling incomplete modalities and improving the accuracy of micro-video popularity prediction.