NeurIPS2020

Towards a Better Global Loss Landscape of GANs

Ruoyu Sun, Tiantian Fang, Alexander G. Schwing

被引用 37 次

摘要

Understanding of GAN training is still very limited. One major challenge is its non-convex-non-concave min-max objective, which may lead to sub-optimal local minima. In this work, we perform a global landscape analysis of the empirical loss of GANs. We prove that a class of separable-GAN, including the original JS-GAN, has exponentially many bad basins which are perceived as mode-collapse. We also study the relativistic pairing GAN (RpGAN) loss which couples the generated samples and the true samples. We prove that RpGAN has no bad basins. Experiments on synthetic data show that the predicted bad basin can indeed appear in training. We also perform experiments to support our theory that RpGAN has a better landscape than separable-GAN. For instance, we empirically show that RpGAN performs better than separable-GAN with relatively narrow neural nets. The code is available at https://github.com/AilsaF/RS-GAN . * This table does NOT show a complete list of works. The goal is to list various types of works. Only one or two works are listed as examples of that class. perspective, we compare representative works in supervised learning with works on GANs in Tab. 1. Second, it may help to understand mode collapse. Bai et al. [7] conjectured that a lack of diversity may be caused by optimization issues, albeit convergence analysis works [65, 8, 34, 11] do not link non-convergence to mode collapse. Thus we suspect that mode collapse is at least partially related to sub-optimal local minima, but a formal theory is still lacking. Third, it may help to understand the training process of GANs. Even understanding a simple two-cluster experiment is challenging because the loss values of min-max optimization are fluctuating during training. Global analysis can provide an additional lens in demystifying the training process. Additional related work is reviewed in Appendix A. Challenges and our solutions. While the idea of a global analysis is natural, there are a few obstacles. First, it is hard to follow a common path of supervised learning [38, 47, 2, 92, 27] to prove global convergence of gradient descent for GANs, because the dynamics of non-convex-non-concave games are much more complicated. Therefore, we resort to a landscape analysis. Note that our approach resembles an "equilibrium analysis" in game theory. Second, it was not clear which formulation can cure the landscape issue of JS-GAN. Wasserstein GAN (W-GAN) is a candidate, but its landscape is hard to analyze due to the extra constraints. After analyzing the issue of JS-GAN, we realize that the idea of "paring", which is implicitly used by W-GAN, is enough to cure the issue. This leads us to consider relativistic pairing GANs (RpGANs) [41, 42] that couple the true data and generated data 1 . We prove that RpGANs have a better landscape than separable-GANs (generalization of JS-GAN). Third, it was not clear whether the theoretical finding affects practical training. We make a few conjectures based on our landscape theory and design experiments to verify those. Interestingly, the experiments match the conjectures quite well. Our contributions. This work provides a global landscape analysis of the empirical version of GANs. Our contributions are summarized as follows: • Does the original JS-GAN have a good landscape, provably? For JS-GAN [35], we prove that the outer-minimization problem has exponentially many sub-optimal strict local minima. Each strict local minimum corresponds to a mode-collapse situation. We also extend this result to a class of separable-GANs, covering hinge loss and least squares loss. • Is there a way to improve the landscape, provably? We study a class of relativistic paring GANs (RpGANs) [41] that pair the true data and the generated data in the loss function. We prove that the outer-minimization problem of RpGAN has no bad strict local minima, improving upon separable-GANs. • Does the improved landscape lead to any empirical benefit? Based on our theory, we predict that RpGANs are more robust to data, network width and initialization than their separable counter-parts, and our experiments support our prediction. Although the empirical benefit of RpGANs was observed before [41], the aspects we demonstrate are closely related to our landscape theory. In addition, using synthetic experiments we explain why mode-collapse (as bad basins) can slow down JS-GAN training. Difference of Population Loss and Empirical Loss Goodfellow et al. [35] proved that the population loss of GANs is convex in the space of probability densities. We highlight that this convexity highly depends on a simple property of the population loss, which may vanish in an empirical setting. Suppose p data is the data distribution, p g is a generated distribution and D ∈ C (0,1) (R d ), where C (0,1) (R d ) is the set of continuous functions with domain R d and codomain (0, 1). Consider the Intuition of Bad "Local Minima" and Separable-GAN: Consider an empirical data distribut