ACL2024

SirLLM: Streaming Infinite Retentive LLM

Yao Yao, Zuchao Li, Hai Zhao

Abstract

As Large Language Models (LLMs) become increasingly prevalent in various domains, their ability to process inputs of any length and maintain a degree of memory becomes essential. However, the one-off input of overly long texts is limited, as studies have shown that when input lengths exceed the LLMs' pre-trained text length, there is a dramatic decline in text generation capabilities. Moreover, simply extending the length of pre-training texts is impractical due to the difficulty in obtaining long text data and the substantial memory consumption costs this would entail for LLMs. Recent efforts have employed streaming inputs to alleviate the pressure of excessively long text inputs, but this approach can significantly impair the model's long-term memory capabilities. Motivated by this challenge, we introduce Streaming Infinite Retentive LLM (SirLLM), which allows LLMs to maintain longer memory during infinite-length dialogues without the need for fine-tuning. SirLLM utilizes the Token Entropy metric and a memory decay mechanism to filter key phrases, endowing LLMs with both long-lasting and flexible memory. We designed three distinct tasks and constructed three datasets to measure the effectiveness of SirLLM from various angles: (1) DailyDialog; (2) Grocery Shopping; (3) Rock-Paper-Scissors. Our experimental results robustly demonstrate that SirLLM can achieve stable and significant improvements across different LLMs and tasks, compellingly proving its effectiveness. When having a coversation, "A sir could forget himself," but SirLLM never does! 043 2024). These applications, aiming to enhance user 044 interaction and conversational experience, often re-045 quire infinite input length and a certain degree of 046 memory capability. However, current LLMs are 047 usually pre-trained on texts of limited length, and 048 studies have shown that their text generation ca-049 pabilities dramatically decline when input lengths 050 exceed those of the pre-training texts (Xiao et al., 051 2023; Huang et al., 2023). Merely extending the 052 length of pre-training texts is impractical, as acquir-053 ing infinitely long text data is exceedingly challeng-054 ing, not to mention that it would result in substan-055 tial memory consumption for LLMs. Therefore, 056 researching how to enable LLMs to handle infinite 057 input lengths while maintaining memory capability 058 is an urgent issue to be addressed. 059 With the emergence of this demand, researchers 060 have gradually shifted their focus towards explor-061 ing ways to expand the input context length of 062 LLMs. A line of these studies has particularly fo-063 cused on optimizing the attention mechanism of 064 LLMs. (Beltagy et al., 2020) first proposes the 065 Sliding-window attention, as shown in Figure 1 (a). 066