NeurIPS2023

Neural Injective Functions for Multisets, Measures and Graphs via a Finite Witness Theorem

Tal Amir, Steven J. Gortler, Ilai Avni, Ravina Ravina, Nadav Dym

35 citations

Abstract

Injective multiset functions have a key role in the theoretical study of machine learning on multisets and graphs. Yet, there remains a gap between the provably injective multiset functions considered in theory, which typically rely on polynomial moments, and the multiset functions used in practice, which rely on neural moments -whose injectivity on multisets has not been studied to date. In this paper, we bridge this gap by showing that moments of neural networks do define injective multiset functions, provided that an analytic non-polynomial activation is used. The number of moments required by our theory is optimal essentially up to a multiplicative factor of two. To prove this result, we state and prove a finite witness theorem, which is of independent interest. As a corollary to our main theorem, we derive new approximation results for functions on multisets and measures, and new separation results for graph neural networks. We also provide two negative results: (1) moments of piecewise-linear neural networks cannot be injective multiset functions; and (2) even when momentbased multiset functions are injective, they can never be bi-Lipschitz. If f is injective, we say that f is moment-injective on M ≤n (Ω). Naturally, injectivity on M ≤n (Ω) implies injectivity on subsets of this space, such as the space of measures in M ≤n (Ω) that are probability measures, or that have only positive weights. In particular, if f is moment-injective on M ≤n (Ω), then it is moment-injective on S ≤n (Ω). To summarize, the main questions we focus on in this paper are: Main Questions: (a) Under what conditions is an MLP f moment-injective on spaces of multisets S ≤n (Ω) or measures M ≤n (Ω)? (b) How many neurons are needed to achieve this injectivity? Main Results Interestingly, we find that the answers to these two questions largely depend on the activation function. Consider shallow neural networks f : R d → R m of the form with the activation function σ : R → R applied entrywise to Ax + b. Suppose that σ is analytic and non-polynomial; such activations include the sigmoid, softplus, tanh, swish and sin. In Section 3 we show that for a large enough m, such networks f (x; A, b) with random parameters A,b are moment-injective on M ≤n (Ω) and on S ≤n (Ω); namely, their induced moment functions f of (2) are injective. This holds for various natural choices of Ω. The embedding dimension m required in (3) depends on the dimension d of Ω: For Ω = R d , to achieve injectivity on S ≤n (Ω) or M ≤n (Ω), it suffices to take m = 2nd + 1 or m = 2n(d + 1) + 1 respectively. When Ω is countable, m = 1 or m = 2n + 1 are sufficient (corresponding to d = 0). In Appendix C we show that in all these cases, these embedding dimensions are optimal essentially up to a multiplicative factor of two. These results are summarized in Table 1 . In Appendix C we also discuss examples where the optimal embedding cardinality for M ≤n (R) is obtained. At the core of our poof of moment injectivity is a theorem which we name the finite witness theorem. This theorem enables reducing an infinite family of analytic equality constraints F ( This theorem generalizes the results in [9] , where a special case of it was proved for semialgebraic domains and functions. The theorem we prove here (see Appendix A) applies to a much wider class of domains and functions, among which are analytic functions. In addition to our main result, we use the finite witness theorem to prove moment injectivity of Gaussian functions (Proposition 3.5) and deep MLPs (Proposition 3.6), and we believe it shall find additional applications beyond those discussed in this work. Negative Results We also prove two negative results for moment-based multiset functions: We show that in contrast to analytic activations, with piecewise linear (PwL) activations, such as ReLU, leaky ReLU and hard arctan, moment injectivity on spaces of measures M ≤n (Ω) with infinite Ω is impossible. On multiset spaces S ≤n (Ω), moment injectivity with PwL activations can be obtained for some irregular, countable Ω, such as the alphabet Ω = Σ α defined in Appendix B, but not for infinite alphabets that arise naturally, like Ω = R d , Z d or (0, 1) d . These results are summarized in the bottom row of Table 1 . The second negative result is that while moments of MLPs with analytic activations can be injective, they can never be stable in the bi-Lipschitz sense. This points to a possible advantage