KDD2024

From Word-prediction to Complex Skills: Compositional Thinking and Metacognition in LLMs

Sanjeev Arora

1 citation

Abstract

The talk will present evidence that today's large language models (LLMs) display somewhat deeper "understanding'' than one would naively expect.1. When asked to solve a task by combining a set of k simpler skills ("test of compositional capability"), they are able to do so despite not having seen the same combination of skills during their training.2. They demonstrate ability to reason about of their own learning processes, which is analogous to "metacognitive knowledge"[Flavel'76] in humans. For instance, given examples of an evaluation task, they can produce a catalog of suitably named skills that are relevant for solving each example of that task. Furthermore, this catalog of skills is meaningful, in the sense that incorporating it into training pipelines improves performance (including of other unrelated LLMs) on that task.We discuss mechanisms by which such complex understanding could arise (including a theory by [Arora,Goyal'23] that tries to explain (a)) and also give examples of how to leverage LLM meta knowledge to improve LLM training pipelines as well as evaluations.