NeurIPS2022
Para-CFlows: -universal diffeomorphism approximators as superior neural surrogates
Junlong Lyu, Zhitang Chen, Chang Feng, Wenjing Cun, Shengyu Zhu, Yanhui Geng, Zhijie Xu, Chen Yongwei
被引用 7 次
摘要
Invertible neural networks based on Coupling Flows (CFlows) have various applications such as image synthesis and data compression. The approximation universality for CFlows is of paramount importance to ensure the model expressiveness. In this paper, we prove that CFlows can approximate any diffeomorphism in C k -norm if its layers can approximate certain single-coordinate transforms. Specifically, we derive that a composition of affine coupling layers and invertible linear transforms achieves this universality. Furthermore, in parametric cases where the diffeomorphism depends on some extra parameters, we prove the corresponding approximation theorems for parametric coupling flows named Para-CFlows. In practice, we apply Para-CFlows as a neural surrogate model in contextual Bayesian optimization tasks, to demonstrate its superiority over other neural surrogate models in terms of optimization performance and gradient approximations. Code will be avaliable at https://gitee.com/mindspore/models/ tree/master/research/bo/paracflow. * Equal contribution. 36th Conference on Neural Information Processing Systems (NeurIPS 2022). c , which is the set of transforms acting on a single coordinate of x. Recall the definition of compactly supported parametric diffeomorphisms group in Section 2.1, where y acts as parameters. One might raise a question: why not directly use the result in non-parametric case, as all the elements in Diff k,m,d c are also in Diff k c (R m+d )? Note that the decomposition theorem only ensures that S k,m+d c can generate Diff k c (R m+d ) and then Diff k,m,d c , it does not guarantee that S k,m,d c can generate Diff k,m,d c . Without such a guarantee, we have to use operators that may alter coordinates of both y and x, and the coordinates of y in the output layer should be exactly the same as the coordinates of y in the input layer, which restricts the flexibility and increases the approximation difficulty. Thus we derive new dedicated theorems for parametric cases. Theorem 3.7. For any subgroup Proof. The following two steps are sketched: 1) If H ⊆ Diff k,m,d c and S k,m,d c ⊆ H, H contains all near-identity diffeomorphisms (Def. B.3). The detail is stated in Cor. B.6. 2) If H ⊆ Diff k,m,d c contains all near-identity diffeomorphisms, H = Diff k,m,d c The detail is stated in Lem. B.10. Given the above theorem, we easily obtain the C k -universality for Diff k,m,d c following Section 3.1. Theorem 3.8. Suppose G is a group of diffeomorphisms over R m+d . If G has C k -universality for S k,m,d c , then G has C k -universality for Diff k,m,d c . A proof of G-INN m,d universality over S k,m,d c , when G contains H-SACF m,d , follows Thm. 3.6 in Section 3.1, and thus the detail of the proof is skipped here.