ICLR2022

Universal Approximation Under Constraints is Possible with Transformers

Anastasis Kratsios, Behnoosh Zamanlooy, Tianlin Liu, Ivan Dokmanic

被引用 38 次

摘要

Many practical problems need the output of a machine learning model to satisfy a set of constraints, KK. Nevertheless, there is no known guarantee that classical neural network architectures can exactly encode constraints while simultaneously achieving universality. We provide a quantitative constrained universal approximation theorem which guarantees that for any non-convex compact set KK and any continuous function f:RnKf:\mathbb{R}^n\rightarrow K, there is a probabilistic transformer F^\hat{F} whose randomized outputs all lie in KK and whose expected output uniformly approximates ff. Our second main result is a"deep neural version"of Berge's Maximum Theorem (1963). The result guarantees that given an objective function LL, a constraint set KK, and a family of soft constraint sets, there is a probabilistic transformer F^\hat{F} that approximately minimizes LL and whose outputs belong to KK; moreover, F^\hat{F} approximately satisfies the soft constraints. Our results imply the first universal approximation theorem for classical transformers with exact convex constraint satisfaction. They also yield that a chart-free universal approximation theorem for Riemannian manifold-valued functions subject to suitable geodesically convex constraints.