NeurIPS2021
PSD Representations for Effective Probability Models
Alessandro Rudi, Carlo Ciliberto
25 citations
Abstract
Finding a good way to model probability densities is key to probabilistic inference. An ideal model should be able to concisely approximate any probability while being also compatible with two main operations: multiplications of two models (product rule) and marginalization with respect to a subset of the random variables (sum rule). In this work, we show that a recently proposed class of positive semi-definite (PSD) models for non-negative functions is particularly suited to this end. In particular, we characterize both approximation and generalization capabilities of PSD models, showing that they enjoy strong theoretical guarantees. Moreover, we show that we can perform efficiently both sum and product rule in closed form via matrix operations, enjoying the same versatility of mixture models. Our results open the way to applications of PSD models to density estimation, decision theory and inference. base point matrices X ∈ R n×d and X ′ ∈ R m×d , we denote by K X,X ′ ,η ∈ R n×m the kernel matrix with entries (K X,X ′ ,η ) ij = k η (x i , x ′ j ) where x i , x ′ j are the i-th and j-th rows of X, X ′ respectively. When clear from context, in the following we will refer to Gaussian PSD models as PSD models. Remark 1 (PSD models generalize Mixture models). Mixture models (a mixture of Gaussian distributions) are a special case of PSD models. Let A = diag(a) be a diagonal matrix of n positive weights a ∈ R n ++ . We have f ( Remark 2 (PSD models allow negative weights). From (2), we immediately see that PSD models generalize mixture models by allowing also for negative weights: e.g., f ( x-1) 2 , i.e. a mixture of Gaussians with also negative weights. Operations with PSD models In Sec. 3 we will show that PSD models can approximate a wide class of probability densities, significantly outperforming mixture models. Here we show that this improvement does not come at the expenses of computations. In particular, we show that PSD models enjoy the same flexibility of mixture models: i) they are closed with respect to key operations such as marginalization and multiplication and ii) these operations can be performed efficiently in terms of matrix sums/products. The derivation of the results reported in the following is provided in Appendix F. They follow from well-known properties of the Gaussian function. Evaluation. Evaluating a PSD model in a point x 0 ∈ X corresponds to f (x = x 0 ; A, X, η) = K ⊤ X,x 0 ,η AK X,x 0 ,η . Moreover, partially evaluating a PSD in one variable yields Note that f (x ; B, X, η 1 ) is still a PSD model since B is positive semidefinite. Sum Rule (Marginalization and Integration ). The integral of a PSD model can be computed as where c η = π d/2 det(diag(η)) -1/2 . This is particularly useful to model probabiliy densities with PSD models. Let Z = f (x ; A, X, η)dx, then the function f (x ; A/Z, X, η) = 1 Z f (x ; A, X, η) is a probability density. Integrating only one variable of a PSD model we obtain the sum rule. Then, the following integral is a PSD model f (x, y ; A, [X, Y ], (η, η ′ )) dx = f (y ; B, Y, η ′ ), with B = c 2η A • K X,X, η 2 , (5) The result above shows that we can efficiently marginalize a PSD model with respect to one variable by means of an entry-wise multiplication between two matrices.