NeurIPS2021
Sharp Impossibility Results for Hyper-graph Testing
Jiashun Jin, Zheng Tracy Ke, Jiajun Liang
7 citations
Abstract
In a broad Degree-Corrected Mixed-Membership (DCMM) setting, we test whether a non-uniform hypergraph has only one community or has multiple communities. Since both the null and alternative hypotheses have many unknown parameters, the challenge is, given an alternative, how to identify the null that is hardest to separate from the alternative. We approach this by proposing a degree matching strategy where the main idea is leveraging the theory for tensor scaling to create a least favorable pair of hypotheses. We present a result on standard minimax lower bound theory and a result on Region of Impossibility (which is more informative than the minimax lower bound). We show that our lower bounds are tight by introducing a new test that attains the lower bound up to a logarithmic factor. We also discuss the case where the hypergraphs may have mixed-memberships. How to derive a sharp lower bound for global testing (and especially, to identify a simple quantity that fully characterizes the lower bound) in our setting is a rather challenging problem. Our model is a non-uniform hypergraph model (see Section 3), which consists of hypergraphs of order 2, 3, . . . , M , and each layer consists of many unknown parameters (θ, P, h, F ). Existing works on lower bounds have been focused on uniform hypergraphs, and non-uniform hypergraphs are much less studied. Even for uniform hypergraphs, existing works have been focused on the the special SBM as in Example 1, not the more general DCMM model. For example, Yuan et al. [22] derived the lower bounds for global testing with the 2-parameter SBM, focusing on the extremely sparse case. Ahn et al. [2] provided lower bound results for exactly recovering the communities (see Liang et al. [19] and Kim et al. [16] for related settings) with a similar model. Lin et al. [20] and Chien et al. [7] used the 3-parameter SBM in Example 1 for study of the lower bounds for community detection. While these papers are very interesting, their lower bounds are characterized by only 2 or 3 parameters (i.e., α n , ρ n , τ n ) assumed in their models. For the much broader tensor-DCMM model considered here, we have many parameters θ, P, h, F (θ is an n-vector and P is a tensor), and how to extend existing results to the tensor-DCMM setting here is a challenging problem.