NeurIPS2023

Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory

Ankur Sikarwar, Mengmi Zhang

被引用 6 次

摘要

Working memory (WM), a fundamental cognitive process facilitating the temporary storage, integration, manipulation, and retrieval of information, plays a vital role in reasoning and decision-making tasks. Robust benchmark datasets that capture the multifaceted nature of WM are crucial for the effective development and evaluation of AI WM models. Here, we introduce a comprehensive Working Memory (WorM) benchmark dataset for this purpose. WorM comprises 10 tasks and a total of 1 million trials, assessing 4 functionalities, 3 domains, and 11 behavioral and neural characteristics of WM. We jointly trained and tested state-of-the-art recurrent neural networks and transformers on all these tasks. For comparison, we also include human behavioral benchmarks. Note that all computational models were never trained with any human data or behavioral biases; yet, these models remarkably replicate some characteristics of WM in biological brains, such as primacy and recency effects. Moreover, we performed neural population analysis on these models and identified neural clusters specialized for different domains and functionalities of WM. Not all computational models exhibit a strong alignment with all human behaviors. Our experimental results also reveal several limitations in existing models to match with working memory capabilities of humans. This dataset serves as a valuable resource for communities in cognitive psychology, neuroscience, and AI, offering a standardized framework to compare and enhance WM models, investigate WM's neural underpinnings, and develop WM models with human-like capabilities. Our source code and data are available at: link. In the field of AI, WM has also attracted considerable attention. AI researchers have aimed to develop computational models that can emulate and augment human-like WM capabilities. Various memory architectures [10, 27, 25, 62, 56] have been proposed, such as neural networks with memory cells, recurrent neural networks, and memory-augmented neural networks. These models have demonstrated promising results in specific WM tasks, such as copying, sorting, and memory recalls, as well as real-world applications [22, 33] , such as navigation in naturalistic environments, video recognition, and language processing. Recently, [45] established a comprehensive working memory (WM) benchmark to aid in the development of WM theories by collecting empirical findings from previous human studies in a wide range of WM tasks. Different from their work, in this paper, we present a large-scale WM