NeurIPS2022

Kernel Multimodal Continuous Attention

Alexander Moreno, Zhenke Wu, Supriya Nagesh, Walter H. Dempsey, James M. Rehg

被引用 2 次

摘要

Attention mechanisms average a data representation with respect to probability weights. Recently, [23] [24] [25] proposed continuous attention, focusing on unimodal exponential and deformed exponential family attention densities: the latter can have sparse support. [8] extended to multimodality via Gaussian mixture attention densities. In this paper, we propose using kernel exponential families [4] and our new sparse counterpart, kernel deformed exponential families. Theoretically, we show new existence results for both families, and approximation capabilities for the deformed case. Lacking closed form expressions for the context vector, we use numerical integration: we prove exponential convergence for both families. Experiments show that kernel continuous attention often outperforms unimodal continuous attention, and the sparse variant tends to highlight time series peaks.