CVPR2021

Visual Navigation With Spatial Attention

Bar Mayo, Tamir Hazan, Ayellet Tal

Abstract

Paths-Ours & [30]'s (b) Our agent's view (c) Our attention (d) [30]'s view Figure 1. Visual navigation. (a) The agent aims at finding a TV (red rectangle) in a living room (top view), starting from a given location (black circle). Our agent's path is marked in orange and [30]'s path is in magenta. At each step, the agent is given a specific view, depending on its position. In this example, our agent starts by turning around in its starting location to gather information-a strategy it has learned. (b) shows our agent's view before the first move forward, whereas (d) shows [30]'s view before its first move forward. (c) shows our attention model, which combines semantic and spatial information of (b)'s view; it directs our agent to move forward, towards the TV. Differently, the view in (d) is part of [30]'s lengthy exploration (magenta path in (a)) after the sought-after TV.