NeurIPS2021
Self-Instantiated Recurrent Units with Dynamic Soft Recursion
Aston Zhang, Yi Tay, Yikang Shen, Alvin Chan, Shuai Zhang
被引用 4 次
摘要
While standard recurrent neural networks explicitly impose a chain structure on different forms of data, they do not have an explicit bias towards recursive selfinstantiation where the extent of recursion is dynamic. Given diverse and even growing data modalities (e.g., logic, algorithmic input and output, music, code, images, and language) that can be expressed in sequences and may benefit from more architectural flexibility, we propose the self-instantiated recurrent unit (Self-IRU) with a novel inductive bias towards dynamic soft recursion. On one hand, the Self-IRU is characterized by recursive self-instantiation via its gating functions, i.e., gating mechanisms of the Self-IRU are controlled by instances of the Self-IRU itself, which are repeatedly invoked in a recursive fashion. On the other hand, the extent of the Self-IRU recursion is controlled by gates whose values are between 0 and 1 and may vary across the temporal dimension of sequences, enabling dynamic soft recursion depth at each time step. The architectural flexibility and effectiveness of our proposed approach are demonstrated across multiple data modalities. For example, the Self-IRU achieves state-of-the-art performance on the logical inference dataset [Bowman et al., 2014] even when comparing with competitive models that have access to ground-truth syntactic information. * Work was done at NTU. 35th Conference on Neural Information Processing Systems (NeurIPS 2021). this design is motivated by the prefrontal cortex/basal ganglia working memory indirection [Kriete et al., 2013] . For example, a child Self-IRU instance drives the gating for outputting from its parent Self-IRU instance. Our proposed Self-IRU is also characterized by the dynamically controlled recursion depths. Specifically, we design a dynamic soft recursion mechanism, which softly learns the depth of recursive self-instantiation on a per-time-step basis. More concretely, certain gates are reserved to control the extent of the Self-IRU recursion. Since values of these gates are between 0 and 1 and may vary across the temporal dimension, they make dynamic soft recursion depth at each time step possible, which could lead to more architectural flexibility across diverse data modalities. This design of the Self-IRU is mainly inspired by the adaptive computation time (ACT) [Graves, 2016] that learns the number of computational steps between an input and an output and recursive neural networks that operate on directed acyclic graphs. On one hand, the Self-IRU is reminiscent of the ACT, albeit operated at the parameter level. While seemingly similar, the Self-IRU and ACT are very different in the context of what the objective is. Specifically, the goal of the Self-IRU is to dynamically expand the parameters of the model, not dynamically decide how long to deliberate on input tokens in a sequence. On the other hand, the Self-IRU marries the benefit of recursive reasoning with recurrent models. However, in contrast to recursive neural networks, the Self-IRU is neither concerned with syntax-guided composition [