ICML2024

An Information-Theoretic Analysis of In-Context Learning

Hong Jun Jeon, Jason D. Lee, Qi Lei, Benjamin Van Roy

被引用 40 次

摘要

Previous theoretical results pertaining to metalearning on sequences build on contrived assumptions and are somewhat convoluted. We introduce new information-theoretic tools that lead to an elegant and very general decomposition of error into three components: irreducible error, meta-learning error, and intra-task error. These tools unify analyses across many meta-learning challenges. To illustrate, we apply them to establish new results about in-context learning with transformers. Our theoretical results characterizes how error decays in both the number of training sequences and sequence lengths. Our results are very general; for example, they avoid contrived mixing time assumptions made by all prior results that establish decay of error with sequence length. An Information-Theoretic Analysis of ICL provide concrete examples which resemble learning from data generated by a deep transformer model and in the appendix we provide simpler problem instances for reference (logistic regression, linear representation learning). Related Works In-context Learning and Transformer. LLMs based on the transformer architecture (Vaswani et al., 2023) have exhibited the ability to learn from data within the context of a prompt (Brown et al., 2020) . This phenomenon, referred to as in-context learning (ICL), has received significant empirical investigation (