NeurIPS2022

Peer Prediction for Learning Agents

Shi Feng, Fang-Yi Yu, Yiling Chen

被引用 9 次

摘要

Peer prediction refers to a collection of mechanisms for eliciting information from human agents when direct verification of the obtained information is unavailable. They are designed to have a game-theoretic equilibrium where everyone reveals their private information truthfully. This result holds under the assumption that agents are Bayesian and they each adopt a fixed strategy across all tasks. Human agents however are observed in many domains to exhibit learning behavior in sequential settings. In this paper, we explore the dynamics of sequential peer prediction mechanisms when participants are learning agents. We first show that the notion of no regret alone for the agents' learning algorithms cannot guarantee convergence to the truthful strategy. We then focus on a family of learning algorithms where strategy updates only depend on agents' cumulative rewards and prove that agents' strategies in the popular Correlated Agreement (CA) mechanism converge to truthful reporting when they use algorithms from this family. This family of algorithms is not necessarily no-regret, but includes several familiar no-regret learning algorithms (e.g multiplicative weight update and Follow the Perturbed Leader) as special cases. Simulation of several algorithms in this family as well as the -greedy algorithm, which is outside of this family, shows convergence to the truthful strategy in the CA mechanism. A fundamental challenge in many domains is to elicit high-quality information from people when directly verifying the acquired information is not feasible, either because the ground truth is not available or because it's too costly to obtain. Notable settings include asking people to label data for machine learning, having students perform peer grading in education, and soliciting customer feedback for products and services. The peer prediction literature has made impressive progress on this challenge in the past two decades, with many mechanisms that have desirable incentive properties developed for this problem [