EMNLP2025

Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains

Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

摘要

Integrating large language models (LLMs) as action proposers in reinforcement learning (RL) boosts performance in text-based environments but incurs high computational costs. We introduce a cache-efficient framework for Bayesian RL with LLM-derived action suggestions, reducing costs while maintaining near-optimal performance. Our approach features a meta-learned adaptive cache, optimized via meta-learning based on policy performance, enabling efficient inference in text-based games (e.g., TextWorld, ALFWorld) and robotic control tasks (e.g., MuJoCo, Meta-World). It achieves a 3.8-4.7× reduction in LLM queries, 4.0-12.0× lower median latencies (85-93ms on consumer hardware), and retains 96-98% of uncached performance. Theoretical KL-divergence bounds ensure reliable cached decisions, validated empirically across tasks with 90.4-95.6% success rates in text environments. For offline RL, our CQL-Prior variant improves performance by 14-29% and reduces training time by 38-40%. Evaluations across eight diverse tasks demonstrate the framework's generalizability and practicality for resource-constrained settings, making LLMguided RL viable for text-based and robotic applications.