ICLR2026

Instance-Dependent Fixed-Budget Pure Exploration in Reinforcement Learning

Yeongjong Kim, Yeoneung Kim, Kwang-Sung Jun

Abstract

We study the problem of fixed budget pure exploration in reinforcement learning.The goal is to identify a near-optimal policy, given a fixed budget on the number of interactions with the environment. Unlike the standard PAC setting, we do not require the target error level ϵ\epsilon and failure rate δ\delta as input. We propose novel algorithms and provide, to the best of our knowledge, the first instance-dependent ϵ\epsilon-uniform guarantee, meaning that the probability that ϵ\epsilon-correctness is ensured can be obtained simultaneously for all ϵ\epsilon above a budget-dependent threshold. It characterizes the budget requirements in terms of the problem-specific hardness of exploration. As a core component of our analysis, we derive a ϵ\epsilon-uniform guarantee for the multiple bandit problem—solving multiple multi-armed bandit instances simultaneously—which may be of independent interest. To enable our analysis, we also develop tools for reward-free exploration under the fixed-budget setting, which we believe will be useful for future work.