NeurIPS2023

DISCS: A Benchmark for Discrete Sampling

Katayoon Goshvadi, Haoran Sun, Xingchao Liu, Azade Nova, Ruqi Zhang, Will Grathwohl, Dale Schuurmans, Hanjun Dai

被引用 13 次

摘要

Sampling in discrete spaces, with critical applications in simulation and optimization, has recently been boosted by significant advances in gradient-based approaches that exploit modern accelerators like GPUs. However, two key challenges hinder the further research progress in discrete sampling. First, since there is no consensus on experimental settings, the empirical results in different research papers are often not comparable. Secondly, implementing samplers and target distributions often requires a nontrivial amount of effort in terms of calibration, parallelism, and evaluation. To tackle these challenges, we propose DISCS (DIS-Crete Sampling), a tailored package and benchmark that supports unified and efficient implementation and evaluations for discrete sampling in three types of tasks: sampling for classical graphical models, combinatorial optimization, and energy based generative models. Throughout the comprehensive evaluations in DISCS, we acquired new insights into scalability, design principles for proposal distributions, and lessons for adaptive sampling design. DISCS implements representative discrete samplers in existing research works as baselines, and offers a simple interface that researchers can conveniently design new discrete samplers and compare with baselines in a calibrated setup directly. Recently, a family of locally balanced samplers (Zanella, 2020; Grathwohl et al., 2021; Sun et al., 2021; Zhang et al., 2022) , using ratio informed proposal distributions, π(y) π(x) , have significantly improved sampling efficiency by exploiting modern accelerators like GPUs and TPUs. From the perspective of gradient flow on the Wasserstein manifold of distributions, Gibbs sampling is simply a coordinate descent algorithm, whereas locally balanced samplers perform as full gradient descent (Sun et al., 2022a). Despite the advances in locally balanced samplers, a quantitative benchmark is still missing. One important reason is that there is no consensus on the experimental setting. Particularly, the initialization of energy based generative models, random seeds used in graphical models, and the protocol of hyper-parameter tuning all have a significant impact on performance. As a result, some empirical results in different research papers may not be comparable. Under this circumstance, a unified benchmark is in crucial need for boosting the research in discrete sampling.