NeurIPS2021
On the Power of Edge Independent Graph Models
Sudhanshu Chanpuriya, Cameron Musco, Konstantinos Sotiropoulos, Charalampos E. Tsourakakis
被引用 16 次
摘要
Why do many modern neural-network-based graph generative models fail to reproduce typical real-world network characteristics, such as high triangle density? In this work we study the limitations of edge independent random graph models, in which each edge is added to the graph independently with some probability. Such models include both the classic Erdös-Rényi and stochastic block models, as well as modern generative models such as NetGAN, variational graph autoencoders, and CELL. We prove that subject to a bounded overlap condition, which ensures that the model does not simply memorize a single graph, edge independent models are inherently limited in their ability to generate graphs with high triangle and other subgraph densities. Notably, such high densities are known to appear in real-world social networks and other graphs. We complement our negative results with a simple generative model that balances overlap and accuracy, performing comparably to more complex models in reconstructing many graph statistics. Introduction Our work centers on edge independent graph models, in which each edge (i, j) is added to the graph independently with some probability P ij ∈ [0, 1]. Formally, Definition 1 (Edge Independent Graph Model). For any symmetric matrix P ∈ [0, 1] n×n let G(P ) be the distribution over undirected unweighted graphs where G ∼ G(P ) contains edge (i, j) independently, with probability P ij . I.e., p(G) Edge independent models encompass many classic random graph models. This includes the Erdös-Rényi model, where for all i = j, P ij = p for some fixed p ∈ [0, 1] [11]. It also includes the stochastic block model where P ij = p if two nodes are in the same community and P ij = q if two nodes are in different communities for some fixed p, q ∈ [0, 1] with q < p [31]. Other examples include e.g., the Chung-Lu configuration model [6], stochastic Kronecker graphs [19]. Recently, significant attention has focused on graph generative models, which seek to learn a distribution over graphs that share similar properties to a given training graph, or set of graphs. Many algorithms parameterize this distribution as an edge independent model or closely related distribution. E.g., NetGAN and the closely related CELL model both produce P ∈ [0, 1] n×n and then sample edges independently without replacement with probabilities proportional to its entries, ensuring that at least one edge is sampled adjacent to each node [4, 25] . Variational Graph Autoencoders (VGAE), GraphVAE, Graphite, and MolGAN are also all based on edge independent models [18, 30, 9, 14] . Given their popularity in both classical and modern graph generative models, it is natural to ask: