CVPR2023

TempSAL - Uncovering Temporal Information for Deep Saliency Prediction

Bahar Aydemir, Ludo Hoffstetter, Tong Zhang, Mathieu Salzmann, Sabine Süsstrunk

Abstract

Figure 1. An example of how human attention evolves over time when observing a single image. Top row: Temporal and image saliency ground truth from the SALICON dataset [20]. Bottom row: Our temporal and image saliency predictions. Each temporal saliency map T i, i ∈ 1, . . . , 5 represents one second of observation time. Note that in T1, the chef is salient, while in T2 and T3, the food on the barbecue becomes the most salient region in this scene. We can predict the temporal saliency maps for each interval separately, or combine them to create a single, refined image saliency map for the entire observation period.