NeurIPS2020

Limits on Testing Structural Changes in Ising Models

Aditya Gangrade, Bobak Nazer, Venkatesh Saligrama

摘要

We present novel information-theoretic limits on detecting sparse changes in Ising models, a problem that arises in many applications where network changes can occur due to some external stimuli. We show that the sample complexity for detecting sparse changes, in a minimax sense, is no better than learning the entire model even in settings with local sparsity. This is a surprising fact in light of prior work rooted in sparse recovery methods, which suggest that sample complexity in this context scales only with the number of network changes. To shed light on when change detection is easier than structured learning, we consider testing of edge deletion in forest-structured graphs, and high-temperature ferromagnets as case studies. We show for these that testing of small changes is similarly hard, but testing of large changes is well-separated from structure learning. These results imply that testing of graphical models may not be amenable to concepts such as restricted strong convexity leveraged for sparsity pattern recovery, and algorithm development instead should be directed towards detection of large changes. Sparse-Recovery-Based Structural Testing Methods. More directly related to our work, are those that are based on direct change estimation (DCE) [FB16; Liu+14; Liu+17; LFS17; KLK19], which attempt to directly characterize the difference of parameters δ * = θ P -θ Q by leveraging sparsity of δ * . These works leverage the 'KL Importance Estimation Procedure' (KLIEP), the key insight of which is that the log-likelihood ratios can be written in a form that is suggestive of expressions from sparse-pattern recovery methods, to define the empirical loss function where Ê denotes an empirical mean, and δ is sparse. The second term, which is the only nonlinear term, is reminiscent of normalization factors in graphical models. In this context, it is useful to recall the key ideas from high-dimensional sparse estimation theory (see [Neg+12] ), which has served as a powerful generic tool. At a high-level, these results show that for a loss function L(δ) paired with a decomposable regulariser (such as an ℓ 1 norm on δ), if the loss function satisfies restricted strong convexity, namely, strong convexity only in a suitable descent error set, as characterised by the regulariser and the optimal value δ * , minimising the penalised empirical loss leads to a non-trivial estimation error bound. Leveraging these concepts of high-dimensional estimation, and exploiting sparsity, the sparse DCE works show that testing can be done in O(poly(s) log p) samples (for any P, Q!), which is further much smaller than the number needed for SL, a result which contradicts bounds we derive in this paper. The situation warrants further discussion. From a technical perspective, the sample complexity gains of these methods arise from assuming law-dependent quantities to be constants. For example, [Liu+14; Liu+17] require that for u ≤ δ * , ∇ 2 L(δ * + u) λ 1 I, and that for S the support of δ * , the submatrix λ 2 I, where λ 1 , λ 2 are constants independent of P, Q. [FB16] removes the second condition, and shows that L has the λ 2 -RSC property, where λ 2 is claimed to be independent of P, Q. In each case, sample costs increase with λ 1 and λ -1 2 . However, the assertion that λ 1 , λ 2 are independent of (P, Q) cannot hold in general -the only non-linear part in L is log ÊP [exp X T δX ], which clearly depends on P ! This dependence also occurs if P is known. Thus, the 'constants' λ 1 , λ 2 are affected by the properties of P . More generically, the efficacy of sparse recovery techniques is questionable in this scenario. Since the data is essentially distinct across samples, and internally dependent, and since the sparse changes, δ * , and the underlying distributions interact, it is unclear if meaningful notions of design matrix that allow testing with sub-recovery sample costs can be developed. Nevertheless, it is an interesting question to understand what additional assumptions on P, Q or topological restrictions are useful in terms of benefiting from sparsity. Our results suggest that these conditions are stronger than typical incoherence conditions such as high temperatures, and further that the topological restrictions demand more than just 'simplicity' of the graphs.