EMNLP2025

Memory-QA: Answering Recall Questions Based on Multimodal Memories

Hongda Jiang, Xinyuan Zhang, Siddhant Garg, Rishab Arora, Shiunzu Kuo, Jiayang Xu, Aaron Colak, Xin Luna Dong

Abstract

We introduce Memory-QA, a novel real-world task that involves answering recall questions about visual content from previously stored multimodal memories. This task poses unique challenges, including the creation of taskoriented memories, the effective utilization of temporal and location information within memories, and the ability to draw upon multiple memories to answer a recall question. To address these challenges, we propose a comprehensive pipeline, PENSIEVE , integrating memory-specific augmentation, time-and location-aware multi-signal retrieval, and multimemory QA fine-tuning. We created a multimodal benchmark to illustrate various real challenges in this task, and show the superior performance of PENSIEVE over state-of-the-art solutions (up to 14% on QA accuracy).