EMNLP2025
FoREST: Frame of Reference Evaluation in Spatial Reasoning Tasks
Tanawan Premsri, Parisa Kordjamshidi
摘要
Spatial reasoning is a fundamental aspect of human intelligence. One key concept in spatial cognition is the Frame of Reference (FoR), which identifies the perspective of spatial expressions. Despite its significance, FoR has received limited attention in AI models that need spatial intelligence. There is a lack of dedicated benchmarks and in-depth evaluation of large language models (LLMs) in this area. To address this issue, we introduce the Frame of Reference Evaluation in Spatial Reasoning Tasks (FoREST) benchmark, designed to assess FoR comprehension in LLMs. We evaluate LLMs on answering questions that require FoR comprehension and layout generation in textto-image models using FoREST. Our results reveal a notable performance gap across different FoR classes in various LLMs, affecting their ability to generate accurate layouts for text-toimage generation. This highlights critical shortcomings in FoR comprehension. To improve FoR understanding, we propose Spatial-Guided prompting, which improves LLMs' ability to extract primitive spatial concepts and relations. Our proposed method improves overall performance across spatial reasoning tasks. Context Generation List of Objects Locatum (L) Relatum (R) <L> <relation> <R> A cat is to the right of a dog from the dog's perspective. A dog is facing toward the camera. Q: Based on camera angle, where is the cat from the dog's position? A: Left Q: In the dog view, how is the cat positioned in relation to the dog? A: Right Camera's perspective Relatum's perspective Visualization A cat is to the right of a dog.