ICLR2026

Learning in Prophet Inequalities with Noisy Observations

Jung-hun Kim, Vianney Perchet

Abstract

We study the prophet inequality, a fundamental problem in online decision-making and optimal stopping, in a practical setting where rewards are observed only through noisy realizations and reward distributions are unknown. At each stage, the decision-maker receives a noisy reward whose true value follows a linear model with an unknown latent parameter, and observes a feature vector drawn from a distribution. To address this challenge, we propose algorithms that integrate learning and decision-making via lower-confidence-bound (LCB) thresholding. In the i.i.d. setting, we establish that both an Explore-then-Decide strategy and an ε\varepsilon-Greedy variant achieve the sharp competitive ratio of 11/e1 - 1/e, under a mild condition on the optimal value. For non-identical distributions, we show that a competitive ratio of 1/21/2 can be guaranteed against a relaxed benchmark. Moreover, with limited window access to past rewards, the tight ratio of 1/21/2 against the optimal benchmark is achieved.