CVPR2025

Knowledge Bridger: Towards Training-Free Missing Modality Completion

Guanzhou Ke, Shengfeng He, Xiaoli Wang, Bo Wang, Guoqing Chao, Yuanyang Zhang, Yi Xie, Hexing Su

摘要

We report the knowledge extraction rules used in our method and the prompts used in the knowledge-driven generation respectively. A.1. Prior Knowledge To simplify the research problem, we utilize only a set of basic instructions to direct the large language model (LMM) focus toward the desired multimodal content extraction. For the general domain, we predefine the extracted knowledge to include major objects, the quantity of each object, and their corresponding attributes and styles. Given the strong reasoning capabilities of LMM in the general domain, it is not necessary to differentiate between input modalities. However, to mitigate hallucinations generated by the LMM, we limit the number of major objects. In our experiments, extracting between 5 to 7 objects show optimal.