EMNLP2024
Making Large Language Models Better Reasoners with Orchestrated Streaming Experiences
Xiangyang Liu, Junliang He, Xipeng Qiu
Abstract
Large language models (LLMs) can perform complex reasoning by generating intermediate thoughts under zero-shot or few-shot settings. However, zero-shot prompting always encounters low performance, and the superior performance of few-shot prompting hinges on the manual-crafted demonstrations. In this paper, we present RoSE (Reasoning with Orchestrated Streaming Experiences), a general framework for solving reasoning tasks that can self-improve without complex external efforts. To enable RoSE, we describe an architecture that extends an LLM to store all answered questions and their thoughts in a streaming experience pool then orchestrates helpful questions from the pool to assist in answering new questions. To set up a question-aware orchestration mechanism, RoSE first calculates the similarity of each question in the pool with a new test question. Since the solution to each answered question is not always correct, RoSE will sort the questions according to their similarity with the new question, and then uniformly divide them into multiple buckets. It finally extracts one question from each bucket to make these extracted questions more diverse. To make these extracted questions help RoSE answer new questions as much as possible, we introduce two other attributes of uncertainty and complexity for each question. RoSE will preferentially select the questions with low uncertainty and high complexity from each bucket. We evaluate the versatility of RoSE in various reasoning tasks, LLMs, and CoT methods. * Corresponding author. Recently, the chain-of-thought (CoT) prompting technique (Wei et al., 2022) was proposed to have LLMs generate intermediate reasoning paths before generating the final answers. The prompting makes LLMs think deeply before giving an answer and further enhances the reasoning power of LLMs. Besides, the zero-shot CoT prompt (Kojima et al., 2022) "Let's think step by step" also enhances the reasoning power of LLMs without any manualcrafting demonstrations. After the CoT prompting was proposed, more studies tried to manually design better prompts (Zhou et al., 2023; Wang et al., 2023a; Yao et al., 2023a) to further improve the performance of LLMs in reasoning. However, no matter how the prompts change, the goal is to have LLMs generate intermediate reasoning steps. Recent works such as ReAct (Yao et al., 2023b), Reflexion (Shinn et al., 2023) , REMEM-BERER (Zhang et al., 2023a), and ExpeL (Zhao et al., 2023) were presented and have demonstrated the feasibility of autonomous agents that are built on top of an LLM core. These methods use LLMs to generate reasoning paths and "actions". These "actions" can be used in API calls and executed in an environment. Besides, some golden feedback will be presented to LLMs during the reasoning process (Shinn et al., 2023; Zhang et al., 2023a) or labeled samples are needed to collect correct or false experiences (Zhao et al., 2023) . Overall, these methods still require humans to carefully design some demonstrations and need golden feedback, labeled samples, or external tools to improve the reasoning performance of LLMs. We investigate how to improve the reasoning performance of LLMs in a more challenging streaming setting without any labeled data, pre-set unlabeled data, feedback signals, and other external help. Inspired by the observation that humans constantly do various exercises to construct a large experience pool in their minds and use the pool to help them quickly and better answer questions in ex-